silver,matthew final thesis

78
Putting the ‘Tech’ in Techno: Detecting Genres and Trendsetters in Electronic Music By Dirichlet Processes Matthew Scott Silver Submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Engineering Department of Operations Research and Financial Engineering Princeton University Adviser: Ramon Van Handel June 2016

Upload: matt-chillver

Post on 09-Jan-2017

253 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Silver,Matthew final thesis

Putting the lsquoTechrsquo in Techno Detecting

Genres and Trendsetters in Electronic

Music By Dirichlet Processes

Matthew Scott Silver

Submitted in partial fulfillment

of the requirements for the degree of

Bachelor of Science in Engineering

Department of Operations Research and Financial Engineering

Princeton University

Adviser Ramon Van Handel

June 2016

I hereby declare that I am the sole author of this thesis

I authorize Princeton University to lend this thesis to other institutions or

individuals for the purpose of scholarly research

Matthew Silver

I further authorize Princeton University to reproduce this thesis by photocopying or

by other means in total or in part at the request of other institutions or individuals

for the purpose of scholarly research

Matthew Silver

Abstract

This thesis provides a foundation of code and models to mathematically analyze

the evolution of Electronic Music (EM) over time Using chronologically ordered

data from the Million Song Dataset it utilizes a Dirichlet Process Gaussian Mixture

Model to assign the songs to clusters based on pitch and timbre data and without

any previous assumptions of the clusters beforehand By examining the characteristic

sounds of songs in each cluster the following conclusions are reached

1 Which artists and songs were most innovative for their time

2 Potential new ways in which the genealogy of and relations between EM genres

can be imagined

Finally this thesis evaluates the strengths and weaknesses of the model used and

suggests future work that can be done to improve upon it

iii

Acknowledgements

I would like to thank Professor Ramon Van Handel for advising me on my thesis

You helped me figure out how to narrow down my goals into a concrete topic and

provided useful input on how to model and frame my problem effectively I would

also like to thank the Princeton ORFE department for providing funding to download

and manage the dataset I used for this project Michael Bino and the Computational

Science and Engineering Support group (CSES) were incredibly useful for helping me

set up and run my programs on the Princeton servers Without your help I would

have had a much harder time getting my 300GB dataset of music to play nice I

would also like to thank Jeffrey Scott Dwoskin for providing the Latex template from

which I wrote this thesis And finally I would like to thank my family and friends

especially Lucas and Kathryn for providing continuous support and feedback The

work we all poured into our theses is incredible and wersquove made it through this

sometimes rocky journey in the greatest university of all

On a personal note regardless of whether you are a current Princeton under-

graduate or are just interested in my work push yourself beyond your comfort zone

and donrsquot let grades or other peoplersquos opinions get in your way Take classes and

join new groups that reflect your passions At the same time love yourself Take

care of your body and have some fun without feeling guilty And finally form great

relationships While Princetonians sometimes appear hypercompetitive and forced

they are genuinely sweet and brilliant people who you will treasure for life These

four years at Princeton have gone by in a flash and in the whirlwind of highs and

lows Irsquove gone through these are the most important lessons Irsquove learned

iv

To my parents

v

Contents

Abstract iii

Acknowledgements iv

List of Tables viii

List of Figures ix

1 Introduction 1

11 Background Information 1

12 Literature Review 3

13 The Dataset 10

2 Mathematical Modeling 12

21 Determining Novelty of Songs 12

22 Feature Selection 14

23 Collecting Data and Preprocessing Selected Features 20

231 Collecting the Data 20

232 Pitch Preprocessing 21

233 Timbre Preprocessing 25

3 Results 27

31 Methodology 27

32 Findings 29

321 α=005 29

vi

322 α=01 33

323 α=02 38

33 Analysis 46

4 Conclusion 53

41 Design Flaws in Experiment 53

42 Future Work 55

43 Closing Remarks 56

A Code 57

A1 Pulling Data from the Million Song Dataset 57

A2 Calculating Most Likely Chords and Timbre Categories 58

A3 Code to Compute Timbre Categories 60

A4 Helper Methods for Calculations 61

Bibliography 68

vii

List of Tables

31 Song cluster descriptions for α = 005 33

32 Song cluster descriptions for α = 01 38

33 Song cluster descriptions for α = 02 45

viii

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 2: Silver,Matthew final thesis

I hereby declare that I am the sole author of this thesis

I authorize Princeton University to lend this thesis to other institutions or

individuals for the purpose of scholarly research

Matthew Silver

I further authorize Princeton University to reproduce this thesis by photocopying or

by other means in total or in part at the request of other institutions or individuals

for the purpose of scholarly research

Matthew Silver

Abstract

This thesis provides a foundation of code and models to mathematically analyze

the evolution of Electronic Music (EM) over time Using chronologically ordered

data from the Million Song Dataset it utilizes a Dirichlet Process Gaussian Mixture

Model to assign the songs to clusters based on pitch and timbre data and without

any previous assumptions of the clusters beforehand By examining the characteristic

sounds of songs in each cluster the following conclusions are reached

1 Which artists and songs were most innovative for their time

2 Potential new ways in which the genealogy of and relations between EM genres

can be imagined

Finally this thesis evaluates the strengths and weaknesses of the model used and

suggests future work that can be done to improve upon it

iii

Acknowledgements

I would like to thank Professor Ramon Van Handel for advising me on my thesis

You helped me figure out how to narrow down my goals into a concrete topic and

provided useful input on how to model and frame my problem effectively I would

also like to thank the Princeton ORFE department for providing funding to download

and manage the dataset I used for this project Michael Bino and the Computational

Science and Engineering Support group (CSES) were incredibly useful for helping me

set up and run my programs on the Princeton servers Without your help I would

have had a much harder time getting my 300GB dataset of music to play nice I

would also like to thank Jeffrey Scott Dwoskin for providing the Latex template from

which I wrote this thesis And finally I would like to thank my family and friends

especially Lucas and Kathryn for providing continuous support and feedback The

work we all poured into our theses is incredible and wersquove made it through this

sometimes rocky journey in the greatest university of all

On a personal note regardless of whether you are a current Princeton under-

graduate or are just interested in my work push yourself beyond your comfort zone

and donrsquot let grades or other peoplersquos opinions get in your way Take classes and

join new groups that reflect your passions At the same time love yourself Take

care of your body and have some fun without feeling guilty And finally form great

relationships While Princetonians sometimes appear hypercompetitive and forced

they are genuinely sweet and brilliant people who you will treasure for life These

four years at Princeton have gone by in a flash and in the whirlwind of highs and

lows Irsquove gone through these are the most important lessons Irsquove learned

iv

To my parents

v

Contents

Abstract iii

Acknowledgements iv

List of Tables viii

List of Figures ix

1 Introduction 1

11 Background Information 1

12 Literature Review 3

13 The Dataset 10

2 Mathematical Modeling 12

21 Determining Novelty of Songs 12

22 Feature Selection 14

23 Collecting Data and Preprocessing Selected Features 20

231 Collecting the Data 20

232 Pitch Preprocessing 21

233 Timbre Preprocessing 25

3 Results 27

31 Methodology 27

32 Findings 29

321 α=005 29

vi

322 α=01 33

323 α=02 38

33 Analysis 46

4 Conclusion 53

41 Design Flaws in Experiment 53

42 Future Work 55

43 Closing Remarks 56

A Code 57

A1 Pulling Data from the Million Song Dataset 57

A2 Calculating Most Likely Chords and Timbre Categories 58

A3 Code to Compute Timbre Categories 60

A4 Helper Methods for Calculations 61

Bibliography 68

vii

List of Tables

31 Song cluster descriptions for α = 005 33

32 Song cluster descriptions for α = 01 38

33 Song cluster descriptions for α = 02 45

viii

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 3: Silver,Matthew final thesis

Abstract

This thesis provides a foundation of code and models to mathematically analyze

the evolution of Electronic Music (EM) over time Using chronologically ordered

data from the Million Song Dataset it utilizes a Dirichlet Process Gaussian Mixture

Model to assign the songs to clusters based on pitch and timbre data and without

any previous assumptions of the clusters beforehand By examining the characteristic

sounds of songs in each cluster the following conclusions are reached

1 Which artists and songs were most innovative for their time

2 Potential new ways in which the genealogy of and relations between EM genres

can be imagined

Finally this thesis evaluates the strengths and weaknesses of the model used and

suggests future work that can be done to improve upon it

iii

Acknowledgements

I would like to thank Professor Ramon Van Handel for advising me on my thesis

You helped me figure out how to narrow down my goals into a concrete topic and

provided useful input on how to model and frame my problem effectively I would

also like to thank the Princeton ORFE department for providing funding to download

and manage the dataset I used for this project Michael Bino and the Computational

Science and Engineering Support group (CSES) were incredibly useful for helping me

set up and run my programs on the Princeton servers Without your help I would

have had a much harder time getting my 300GB dataset of music to play nice I

would also like to thank Jeffrey Scott Dwoskin for providing the Latex template from

which I wrote this thesis And finally I would like to thank my family and friends

especially Lucas and Kathryn for providing continuous support and feedback The

work we all poured into our theses is incredible and wersquove made it through this

sometimes rocky journey in the greatest university of all

On a personal note regardless of whether you are a current Princeton under-

graduate or are just interested in my work push yourself beyond your comfort zone

and donrsquot let grades or other peoplersquos opinions get in your way Take classes and

join new groups that reflect your passions At the same time love yourself Take

care of your body and have some fun without feeling guilty And finally form great

relationships While Princetonians sometimes appear hypercompetitive and forced

they are genuinely sweet and brilliant people who you will treasure for life These

four years at Princeton have gone by in a flash and in the whirlwind of highs and

lows Irsquove gone through these are the most important lessons Irsquove learned

iv

To my parents

v

Contents

Abstract iii

Acknowledgements iv

List of Tables viii

List of Figures ix

1 Introduction 1

11 Background Information 1

12 Literature Review 3

13 The Dataset 10

2 Mathematical Modeling 12

21 Determining Novelty of Songs 12

22 Feature Selection 14

23 Collecting Data and Preprocessing Selected Features 20

231 Collecting the Data 20

232 Pitch Preprocessing 21

233 Timbre Preprocessing 25

3 Results 27

31 Methodology 27

32 Findings 29

321 α=005 29

vi

322 α=01 33

323 α=02 38

33 Analysis 46

4 Conclusion 53

41 Design Flaws in Experiment 53

42 Future Work 55

43 Closing Remarks 56

A Code 57

A1 Pulling Data from the Million Song Dataset 57

A2 Calculating Most Likely Chords and Timbre Categories 58

A3 Code to Compute Timbre Categories 60

A4 Helper Methods for Calculations 61

Bibliography 68

vii

List of Tables

31 Song cluster descriptions for α = 005 33

32 Song cluster descriptions for α = 01 38

33 Song cluster descriptions for α = 02 45

viii

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 4: Silver,Matthew final thesis

Acknowledgements

I would like to thank Professor Ramon Van Handel for advising me on my thesis

You helped me figure out how to narrow down my goals into a concrete topic and

provided useful input on how to model and frame my problem effectively I would

also like to thank the Princeton ORFE department for providing funding to download

and manage the dataset I used for this project Michael Bino and the Computational

Science and Engineering Support group (CSES) were incredibly useful for helping me

set up and run my programs on the Princeton servers Without your help I would

have had a much harder time getting my 300GB dataset of music to play nice I

would also like to thank Jeffrey Scott Dwoskin for providing the Latex template from

which I wrote this thesis And finally I would like to thank my family and friends

especially Lucas and Kathryn for providing continuous support and feedback The

work we all poured into our theses is incredible and wersquove made it through this

sometimes rocky journey in the greatest university of all

On a personal note regardless of whether you are a current Princeton under-

graduate or are just interested in my work push yourself beyond your comfort zone

and donrsquot let grades or other peoplersquos opinions get in your way Take classes and

join new groups that reflect your passions At the same time love yourself Take

care of your body and have some fun without feeling guilty And finally form great

relationships While Princetonians sometimes appear hypercompetitive and forced

they are genuinely sweet and brilliant people who you will treasure for life These

four years at Princeton have gone by in a flash and in the whirlwind of highs and

lows Irsquove gone through these are the most important lessons Irsquove learned

iv

To my parents

v

Contents

Abstract iii

Acknowledgements iv

List of Tables viii

List of Figures ix

1 Introduction 1

11 Background Information 1

12 Literature Review 3

13 The Dataset 10

2 Mathematical Modeling 12

21 Determining Novelty of Songs 12

22 Feature Selection 14

23 Collecting Data and Preprocessing Selected Features 20

231 Collecting the Data 20

232 Pitch Preprocessing 21

233 Timbre Preprocessing 25

3 Results 27

31 Methodology 27

32 Findings 29

321 α=005 29

vi

322 α=01 33

323 α=02 38

33 Analysis 46

4 Conclusion 53

41 Design Flaws in Experiment 53

42 Future Work 55

43 Closing Remarks 56

A Code 57

A1 Pulling Data from the Million Song Dataset 57

A2 Calculating Most Likely Chords and Timbre Categories 58

A3 Code to Compute Timbre Categories 60

A4 Helper Methods for Calculations 61

Bibliography 68

vii

List of Tables

31 Song cluster descriptions for α = 005 33

32 Song cluster descriptions for α = 01 38

33 Song cluster descriptions for α = 02 45

viii

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 5: Silver,Matthew final thesis

To my parents

v

Contents

Abstract iii

Acknowledgements iv

List of Tables viii

List of Figures ix

1 Introduction 1

11 Background Information 1

12 Literature Review 3

13 The Dataset 10

2 Mathematical Modeling 12

21 Determining Novelty of Songs 12

22 Feature Selection 14

23 Collecting Data and Preprocessing Selected Features 20

231 Collecting the Data 20

232 Pitch Preprocessing 21

233 Timbre Preprocessing 25

3 Results 27

31 Methodology 27

32 Findings 29

321 α=005 29

vi

322 α=01 33

323 α=02 38

33 Analysis 46

4 Conclusion 53

41 Design Flaws in Experiment 53

42 Future Work 55

43 Closing Remarks 56

A Code 57

A1 Pulling Data from the Million Song Dataset 57

A2 Calculating Most Likely Chords and Timbre Categories 58

A3 Code to Compute Timbre Categories 60

A4 Helper Methods for Calculations 61

Bibliography 68

vii

List of Tables

31 Song cluster descriptions for α = 005 33

32 Song cluster descriptions for α = 01 38

33 Song cluster descriptions for α = 02 45

viii

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 6: Silver,Matthew final thesis

Contents

Abstract iii

Acknowledgements iv

List of Tables viii

List of Figures ix

1 Introduction 1

11 Background Information 1

12 Literature Review 3

13 The Dataset 10

2 Mathematical Modeling 12

21 Determining Novelty of Songs 12

22 Feature Selection 14

23 Collecting Data and Preprocessing Selected Features 20

231 Collecting the Data 20

232 Pitch Preprocessing 21

233 Timbre Preprocessing 25

3 Results 27

31 Methodology 27

32 Findings 29

321 α=005 29

vi

322 α=01 33

323 α=02 38

33 Analysis 46

4 Conclusion 53

41 Design Flaws in Experiment 53

42 Future Work 55

43 Closing Remarks 56

A Code 57

A1 Pulling Data from the Million Song Dataset 57

A2 Calculating Most Likely Chords and Timbre Categories 58

A3 Code to Compute Timbre Categories 60

A4 Helper Methods for Calculations 61

Bibliography 68

vii

List of Tables

31 Song cluster descriptions for α = 005 33

32 Song cluster descriptions for α = 01 38

33 Song cluster descriptions for α = 02 45

viii

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 7: Silver,Matthew final thesis

322 α=01 33

323 α=02 38

33 Analysis 46

4 Conclusion 53

41 Design Flaws in Experiment 53

42 Future Work 55

43 Closing Remarks 56

A Code 57

A1 Pulling Data from the Million Song Dataset 57

A2 Calculating Most Likely Chords and Timbre Categories 58

A3 Code to Compute Timbre Categories 60

A4 Helper Methods for Calculations 61

Bibliography 68

vii

List of Tables

31 Song cluster descriptions for α = 005 33

32 Song cluster descriptions for α = 01 38

33 Song cluster descriptions for α = 02 45

viii

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 8: Silver,Matthew final thesis

List of Tables

31 Song cluster descriptions for α = 005 33

32 Song cluster descriptions for α = 01 38

33 Song cluster descriptions for α = 02 45

viii

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 9: Silver,Matthew final thesis

List of Figures

11 A userrsquos taste profile generated by Spotify 4

12 Data processing pipeline for Mauchrsquos study illustrated with a segment

of Queenrsquos Bohemian Rhapsody 1975 8

21 scikit-learn example of GMM vs DPGMM and tuning of α 15

22 Number of Electronic Music Songs in Million Song Dataset from Each

Year 26

31 Song year distributions for α = 005 31

32 Timbre and pitch distributions for α = 005 32

33 Song year distributions for α = 01 35

34 Timbre and pitch distributions for α = 01 37

35 Song year distributions for α = 02 41

36 Timbre and pitch distributions for α = 02 44

ix

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 10: Silver,Matthew final thesis

Chapter 1

Introduction

11 Background Information

Electronic Music (EM) is an increasingly popular genre of music with an immense

presence and influence on modern culture Because the genre is new as a whole and

is arguably more loosely structured than other genres - technology has enabled the

creation of a wide range of sounds and easy blending of existing and new sounds alike

- formal analysis especially mathematical analysis on the genre is fairly limited and

has only begun growing in the past few years As a fan of EM I am interested in

exploring how the genre has evolved over time More specifically my goal with this

project was to design some structure or model that could help me identify which EM

artists have contributed the most stylistically to the genre Oftentimes famous EM

artists do not create novel-sounding music but rather popularize an existing style

and the motivation of this study is to understand who has stylistically contributed

the most to the EM scene versus those who have merely popularized aspects of it

As the study progressed the manner in which I constructed my model lent to

a second goal of the thesis imagining new ways in which we can imagine EM genres

1

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 11: Silver,Matthew final thesis

While there exists an extensive amount of research analyzing music trends from

a non-mathematical (cultural societal artistic) perspective the analysis of EM

from a mathematical perspective and especially with respect to any computationally

measurable trends in the genre is close to nonexistent EM has been analyzed to a

lesser extent than other common genres of music in the academic world most likely

due to existing for a shorter amount of time and being less rooted in prominant

social and cultural events In fact the first published reference work on EM did not

exist until 2012 when Professor Mark J Butler from Northwestern University edited

and published Electronica Dance and Club Music a collection of essays exploring

EM genres and culture [1] Furthermore there are very few comprehensive visual

guides that allow a user to relate every genre to each other and easily observe how

different genres converge and diverge While conducting research the best guide I

found was not a scholarly source but an online guide created by an EM enthusiast

Ishkurrsquos Guide to Electronic Music [2] This guide which includes over 100 specific

genres grouped by more general genres and represents chronological evolutions by

connecting each genre in a flowchart is the most exhaustive analysis of the EM scene

I could find However the guidersquos analysis is very qualitative While each subgenre

contains an explanation on typical rhythm and sounds and includes well-known

songs indicative of the style the guide is created by someone who used historical and

personal knowledge of EM My model which creates music genres by chronologically

ordering songs and then assigning them to clusters is a different approach towards

imagining the entire landscape of EM The results may confirm Ishkurrsquos Guidersquos

findings in which case his guide is given additional merit with mathematical evi-

dence or it may be different suggesting that there may be better ways to group EM

genres One advantage that guides such as Ishkurrsquos and historically-based scholarly

works have over my approach is that those models are history-sensitive and therefore

may group songs in a way that historically makes sense On the other hand my

2

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 12: Silver,Matthew final thesis

model is history-agnostic and may not realize the historical context of songs when

clustering However I believe that there is still significant merit to my research

Instead of classifying genres of music by early genres that led to them my approach

gives the most credit to the artists and songs that were the most innovative for their

time and perhaps reveal different musical styles that are more similar to each other

than history would otherwise imply This way of thinking of music genres while

unconventional is another way of imagining EM

The practice of quantitatively analyzing music has exploded in the last decade

thanks to technological and algorithmic advances that allow data scientists to con-

structively sift through troves of music and listener information In the literature

review I will focus on two particular organizations that have contributed greatly to

the large-scale mathematical analysis of music Pandora a website that plays songs

similar to a songartistalbum inputted by a user and Echo Nest a music analytics

firm that was acquired by Spotify in 2014 and drives Spotifyrsquos Discover Weekly

feature [3] After evaluating the relevance of these sources to my thesis work I will

then look over the relevant academic research and evaluate what this research can

contribute

12 Literature Review

The analysis of quantitative music generally falls into two categories research con-

ducted by academics and academic organizations for scholarly purposes and research

conducted by companies and primarily targeted for consumers First looking at the

consumer-based research Spotify and Pandora are two of the most prominent based

groups and the two I decided to focus on Spotify is a music streaming service where

users can listen to albums and songs from a wide variety of artists or listen to weekly

3

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 13: Silver,Matthew final thesis

playlists generated based on the music the user and userrsquos friends have listened to

The weekly playlist called Discover Weekly Playlist is a relatively new feature in

Spotify and is driven by music analysis algorithms created from Echo Nest Using

the Echo Nest code interface Spotify creates a ldquotaste profilerdquo for each user which

assesses attributes such as how often a user branches out to new styles of music how

closely the userrsquos music streamed follows popular Billboard music charts and so on

Spotify also looks at the artists and songs the user streamed and creates clusters

of different genres that the user likes (see figure 11) The taste profile and music

clusters can then be used to generate playlists geared to a specific user The genres

in the cluster come from a list of nearly 800 names which are derived by scraping

the Internet for trending terms in music as well as training various algorithms on a

regular basic by ldquolisteningrdquo to new songs [4][5]

Figure 11 A userrsquos taste profile generated by Spotify

4

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 14: Silver,Matthew final thesis

Although Spotify and Echo Nestrsquos algorithms are very useful for mapping the land-

scape of established and emerging genres of music the methodology is limited to

pre-defined genres of music This may serve as a good starting point to compare my

final results to but my study aims to be as context-free as possible by attaching no

preconceived notions of music styles or genres instead looking at features that could

be measured in every song

While Spotifyrsquos approach to mapping music is very high-tech and based on ex-

isting genres Pandora takes a very low-tech and context-free approach to music

clustering Pandora created the Music Genome Project a multi-year undertaking

where skilled music theorists listened to a large number of songs and analyzed up to

450 characteristics in each song [6] Pandorarsquos approach is appealing to the aim of

my study since it does not take any preconceived notions of what a genre of music

is instead comparing songs on common characteristics such as pitch rhythm and

instrument patterns Unfortunately I do not have a cadre of skilled music theorists

at my disposal nor do I have 10 years to perform such calculations like the dedicated

workers at Pandora (tips the indestructible fedora) Additionally Pandorarsquos Music

Genome Project is intellectual property so at best I can only rely on the abstract

concepts of the Music Genome Project to drive my study

In the academic realm there are no existing studies analyzing quantifiable changes in

EM specifically but there exist a few studies that perform such analysis on popular

Western music in general One such study is Measuring the Evolution of Contem-

porary Western Popular Music which analyzes music from 1955-2010 spanning all

common genres Using the Million Song Dataset a free public database of songs

each containing metadata (see section 13) the study focuses on the attributes pitch

timbre and loudness Pitch is defined as the standard musical notes or frequency of

5

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 15: Silver,Matthew final thesis

the sound waves Timbre is formally defined as the Mel frequency cepstral coefficients

(MFCC) of a transformed sound signal More informally it refers to the sound color

texture or tone quality and is associated with instrument types recording resources

and production techniques In other words two sounds that have the same pitch

but different tones (for example a bell and voice) are differentiated by their timbres

There are 12 MFCCs that define the timbre of a given sound Finally loudness

refers to intrinsically how loud the music sounds not loudness that a listener can

manipulate while listening to the music Loudness is the first MFCC of the timbre

of a sound [7] The study concluded that over time music has been becoming louder

and less diverse

The restriction of pitch sequences (with metrics showing less variety inpitch progressions) the homogenization of the timbral palette (with fre-quent timbres becoming more frequent) and growing average loudnesslevels (threatening a dynamic richness that has been conserved until to-day) This suggests that our perception of the new would be essentiallyrooted on identifying simpler pitch sequences fashionable timbral mix-tures and louder volumes Hence an old tune with slightly simpler chordprogressions new instrument sonorities that were in agreement with cur-rent tendencies and recorded with modern techniques that allowed forincreased loudness levels could be easily perceived as novel fashionableand groundbreaking

This study serves as a good starting point for mathematically analyzing music in

a few ways First it utilizes the Million Song Dataset which addresses the issue

of legally obtaining music metadata As mentioned in section 13 the only legal

way to obtain playable music for this study would have been to purchase all songs I

would include which is infeasible While the Million Song Dataset does not contain

the audio files in playable format it does contain audio features and metadata that

allow for in-depth analysis In addition working with the dataset takes out the

work of extracting features from raw audio files saving an extensive amount of time

and energy Second the study establishes specifics for what constitutes a trend

in music Pitch timbre and loudness are core features of music and examining the6

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 16: Silver,Matthew final thesis

distributions of each among songs over time reveals a lot of information about how

the music industry and consumersrsquo tastes have evolved While these are not all of the

features contained in a song they serve as a good starting point Third the study

defines mathematical ways to capture music attributes and measure their change

over time For example pitches are transposed into the same tonal context with

binary discretized pitch descriptions based on a threshold so that each song can be

represented with vectors of pitches that are normalized and compared to other songs

While this study lays some solid groundwork for capturing and analyzing nu-

meric qualities of music it falls short of addressing my goals in a couple of ways

First it does not perform any analysis with respect to music genre While the

analysis performed in this paper could easily be applied to a list of songs in a specific

genre certain genres might have unique sounds and rhythms relative to other genres

that would be worth studying in greater detail Second the study only measures

general trends in music over time The models used to describe changes are simple

regressions that donrsquot look at more nuanced changes For example what styles of

music developed over certain periods of time How rapid were those changes Which

styles of music developed from which other styles

A more promising study led by music researcher Matthias Mauch [8] analyzes

contemporary popular Western Music from the 1960s to 2010s by comparing numer-

ical data on the pitches and timbre of a corpus of 17000 songs that appeared on the

Billboard Hot 100 Like the previously mentioned paper Measuring the Evolution

of Contemporary Western Popular Music Mauchrsquos study also creates abstractions

of pitch and timbre in order to provide a consistent and meaningful semantic inter-

pretation of musical data (see figure 12) However Mauchrsquos study takes this idea a

step further by using genre tags from Lastfm a music website and constructing a

7

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 17: Silver,Matthew final thesis

hierarchy of music genres using hierarchical clustering Additionally the study takes

a crack at determining whether a particular band the Beatles was musically ground-

breaking for its time or merely playing off sounds that other bands had already used

Figure 12 Data processing pipeline for Mauchrsquos study illustrated with a segment ofQueenrsquos Bohemian Rhapsody 1975

While both Measuring the Evolution of Contemporary Western Popular Music

and Mauchrsquos study created abstractions of pitch and timbre Mauchrsquos study is more

appealing with respect to my goal because its end results align more closely with

mine Additionally the data processing pipeline offers several layers of abstraction

8

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 18: Silver,Matthew final thesis

and depending on my progress I would be able to achieve at least one of the levels of

abstraction As shown in figure 12 each segment of a raw audio file is first broken

down into its 12 timbre MFCCs and pitch components Next the study constructs

ldquolexiconsrdquo or a dictionary of pitch and timbre terms that all songs can be compared

to For pitch the original data is in a N-by-12 matrix where N is the number of time

segments in the song and 12 the number of each of the notes found in an octave of

pitches Each time segment contains the relative strengths of each of the 12 pitches

However music sounds are not merely a collection of pitches but more precisely

chords Furthermore the similarity of two songs is not determined by the absolute

pitches of their chords but rather the progression of chords in the song all relative to

each other For example if all the notes in a song are transposed by one step the song

will sound different in terms of absolute pitch but the song will still be recognized

as the original because all of the relative movements from each chord to the next

are the same This phenomenon is captured in the pitch data by finding the most

likely chord played at each time segment then counting the change to the next chord

at each time step and generating a table of chord change frequencies for each song

Constructing the timbre lexcion is more complicated since there is no easy analogue

like chords for pitches to compare songs Mauchrsquos study utilizes a Gaussian Mixture

Model (GMM) by iterating over k=1 to k=N clusters where N is a large number

running the GMM on each prior assumption of k clusters and computing the Bayes

Information Criterion (BIC) for each model The lowest of the N BIC values is found

and that value of k is selected That model contains k different timbre clusters

and each cluster contains the mean timbre value for each of the 12 timbre components

For my research I decided that the pitch and timbre lexicons would be the most

realistic level of abstraction I could obtain Mauchrsquos study adds an addtional layer

to pitch and timbre by identifying the most common patterns of chord changes and

9

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 19: Silver,Matthew final thesis

most common timbre rhythms and creating more general tags from these combined

terms such as ldquo stepwise changes indicating modal harmonyrdquo for a pitch topic and

ldquooh rounded mellowrdquo for a timbral topic There were two problems with using this

final layer of abstraction for my study First attaching semantic interpretations to

the pitch and timbral lexicons is a difficult task For timbre I would need to listen

to sound samples containing all of the different timbral categories I identified and

attaching user interpretations to them For the chords not only would I have to

perform the same analysis as on timbre but take careful attention to identify which

chords correspond to common sound progressions in popular music a task that I am

not qualified for an did not have the resources for this thesis to seek out Second

this final layer of abstraction was not necessary for the end goal of my paper In

fact consolidating my pitch and timbre lexicons into simpler phrases would run the

risk of pigeonholing my analysis and preventing me from discovering more nuanced

patterns in my final results Therefore I decided to focus on pitch and timbral

lexicon construction as the furthest levels of abstraction when processing songs for

my thesis Mathematical details on how I constructed the lexical and timbral lexicons

can be found in the Mathematical Modeling section of this paper

13 The Dataset

In order to successfully execute my thesis I need access to an extensive database of

music Until recently acquiring a substantial corpus of music data was a difficult and

costly task It is illegal to download music audio files from video and music-sharing

sites such as YouTube Spotify and Pandora Some platforms such as iTunes offer

90-second previews of songs but using only segments of songs and usually segments

that showcase the chorus of the song are not reliable measures to capture the entire

essence of a song Even if I were to legally download entire audio files for free I would

10

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 20: Silver,Matthew final thesis

run into additional issues Obtaining a high-quality corpora of song data would be

challenging writing scripts that crawl music sharing platforms may not capture all of

the music I am looking for And once I have the audio files I would have to perform

audio processing techniques to extract the relevant information from the songs

Fortunately there is an easy solution to the music data acquisition problem

The Million Song Dataset (MSD) is a collection of metadata for one million music

tracks dating up to 2011 Various organizations such as The Echo Nest Musicbrainz

7digital and Lastfm have contributed different pieces of metadata Each song is

represented as a Hierarchical Data Format file (HDF5) which can be loaded as a

JSON object The fields encompass topical features such as the song title artist

and release date as well as lower-level features such as the loudness starting beat

time pitches and timbre of several segments of the song [9] While the MSD is

the largest free and open source music metadata dataset I could find there is no

guarantee that it adequately covers the entire spectrum of EM artists and songs

This quality limitation is important to consider throughout the study A quick look

through the songs including the subset of data I worked with for this report showed

that there were several well-known artists and songs in the EM scene Therefore

while the MSD may not contain all desired songs for this project it contains an

adequate number of relevant songs to produce some meaningful results Additionally

laying the groundwork for modeling the similarities between songs and identifying

groundbreaking ones is the same regardless of the songs included and the following

methodologies can be implemented on any similarly-formatted dataset including one

with songs that might currently be missing in the MSD

11

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 21: Silver,Matthew final thesis

Chapter 2

Mathematical Modeling

21 Determining Novelty of Songs

Finding an logical and implementable mathematical model was and continues to be

an important aspect of my research My problem how to mathematically determine

which songs were unique for their time requires an algorithm in which each song is

introduced in chronological order either joining an existing category or starting a

new category based on its musical similarity to songs already introduced Clustering

algorithms like k-means or Gaussian Mixture Models (GMM) which have a prede-

termined number of clusters and optimize the partitioning of a dataset into those

cluster assume a fixed number of clusters While this process would work if we knew

exactly how many genres of EM existed if we guess wrong our end results may end

up with clusters that are wrongly grouped together or separated It is much better to

apply a clustering algorithm that does not make any assumptions about this number

One particularly promising process that addresses the issue of tthe number of

clusters is a family of algorithms known as Dirichlet Proccesses (DPs) DPs are

useful for this particular application because (1) they assign clusters to a dataset

12

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 22: Silver,Matthew final thesis

with only an upper bound on the number of clusters and (2) by sorting the songs

in chronological order before running the algorithm and keeping track of which

songs are categorized under each cluster we can observe the earliest songs in each

cluster and consequentially infer which songs were responsible for creating new

clusters The arguments for the DP The DP is controlled by a parameter α which

is the concentration parameter The expected number of clusters formed is directly

proportional to the value of α so the higher the value of α the more likely new

clusters will be formed [10] Regardless of the value of α as the number of data

points introduced increases the probability of a new group being formed decreases

That is a ldquorich get richerrdquo policy is in place and existing clusters tend to grow in

size Tweaking the value of the tunable parameter α is an important part of the

study since it determines the flexibility given to forming a new cluster If the value

of α is too small then the criteria for forming clusters will be too strict and data

that should be in different clusters will be assigned to the same cluster On the other

hand if α is too large the algorithm will be too sensitive and assign similar songs to

different clusters

The implementation of the DP was achieved using scikit-learnrsquos library and API for

Dirichlet Process Gaussian Mixture Model (DPGMM) The DPGMM is the formal

name of the Dirichlet Process model used to cluster the data More specifically

scikit-learnrsquos implementation of the DPGMM uses the Stick Breaking method

one of several equally valid methods to assign songs to clusters [11] While the

mathematical details for this algorithm can be found at the following citation [12]

the most important aspects of the DPGMM are the arguments that the user can

specify and tune The first of these tunable parameters is the value α which is the

same parameter as the α discussed in the previous paragraph As seen in Figure 21

on the right side properly tuning α is key to obtaining meaningful clusters The

13

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 23: Silver,Matthew final thesis

center image has α set to 001 which is too small and results in all of the data being

formed under one cluster On the other hand the bottom-right image has the same

data set and α set to 100 which does a better job of clustering On a related note

the figure also demonstrates the effectiveness of the DPGMM over the GMM On the

left side clearly the dataset contains 2 clusters but the GMM on the top-left image

assumes 5 clusters as a prior and consequentially clusters the data incorrectly while

the DPGMM manages to limit the data to 2 clusters

The second argument that the user inputs for the DPGMM is the data that

will be clustered The scikit-learn implementation takes the data in the format

of a nested list (N lists each of length m) where N is the number of data points

and m the number of features While the format of the data structure is relatively

straightforward choosing which numbers should be in the data was a challenge I

faced Selecting the relevant features of each song to be used in the algorithm will

be expounded upon in the next section ldquoFeature Selectionrdquo

The last argument that a user inputs for the scikit-learn DGPMM implementa-

tion is an argument indicating the upper bound for the number of clusters The

Dirichlet Process then determines the best number of clusters for the data between

1 and the upper bound Since the DPGMM is flexible enough to find the best value

I set an arbitrary upper bound of 50 clusters and focused more on the tuning of α to

modify the number of clusters formed

22 Feature Selection

One of the most difficult aspects of the Dirichlet method is choosing the features

to be used for clustering In other words when we organize the songs into clusters

14

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 24: Silver,Matthew final thesis

Figure 21 scikit-learn example of GMM vs DPGMM and tuning of α

we need to ensure that each cluster is distinct in a way that is statistically and

intuitively logical In the Million Song Dataset [9] each song is represented as a

JSON object containing several fields These fields are candidate features to be used

in the Dirichlet algorithm Below is an example song ldquoNever Gonna Give You Uprdquo

by Rick Astley and the corresponding features

artist_mbid db92a151-1ac2-438b-bc43-b82e149ddd50 (the musicbrainzorg ID

for this artists is db9)

artist_mbtags shape = (4) (this artist received 4 tags on musicbrainzorg)

artist_mbtags_count shape = (4)

(raw tag count of the 4 tags this artist received on musicbrainzorg)

artist_name Rick Astley (artist name)

artist_playmeid 1338 (the ID of that artist on the service playmecom)

artist_terms shape = (12) (this artist has 12 terms (tags) from The Echo Nest)

artist_terms_freq shape = (12) (frequency of the 12 terms from The Echo Nest

(number between 0 and 1))

artist_terms_weight shape = (12) (weight of the 12 terms from The Echo Nest

(number between 0 and 1))

15

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 25: Silver,Matthew final thesis

audio_md5 bf53f8113508a466cd2d3fda18b06368 (hash code of the audio used for

the analysis by The Echo Nest)

bars_confidence shape = (99) (confidence value (between 0 and 1) associated

with each bar by The Echo Nest)

bars_start shape = (99) (start time of each bar according to The Echo Nest this

song has 99 bars)

beats_confidence shape = (397))

confidence value (between 0 and 1) associated with each beat by The Echo Nest

beats_start shape = (397) (start time of each beat according to The Echo Nest

this song has 397 beats)

danceability 00 (danceability measure of this song according to The Echo Nest

(between 0 and 1 0 =gt not analyzed))

duration 21169587 (duration of the track in seconds)

end_of_fade_in 0139 (time of the end of the fade in at the beginning of the

song according to The Echo Nest)

energy 00 (energy measure (not in the signal processing sense) according to The

Echo Nest (between 0 and 1 0 = not analyzed))

key 1 (estimation of the key the song is in by The Echo Nest)

key_confidence 0324 (confidence of the key estimation)

loudness -775 (general loudness of the track)

mode 1 (estimation of the mode the song is in by The Echo Nest)

mode_confidence 0434 (confidence of the mode estimation)

release Big Tunes - Back 2 The 80s (album name from which the track was taken

some songs tracks can come from many albums we give only one)

release_7digitalid 786795 (the ID of the release (album) on the service 7digi-

talcom)

sections_confidence shape = (10) (confidence value (between 0 and 1) associated

16

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 26: Silver,Matthew final thesis

with each section by The Echo Nest)

sections_start shape = (10) (start time of each section according to The Echo

Nest this song has 10 sections)

segments_confidence shape = (935) (confidence value (between 0 and 1) asso-

ciated with each segment by The Echo Nest)

segments_loudness_max shape = (935) (max loudness during each segment)

segments_loudness_max_time shape = (935) (time of the max loudness

during each segment)

segments_loudness_start shape = (935) (loudness at the beginning of each

segment)

segments_pitches shape = (935 12) (chroma features for each segment (normal-

ized so max is 1))

segments_start shape = (935) (start time of each segment ( musical event or

onset) according to The Echo Nest this song has 935 segments)

segments_timbre shape = (935 12) (MFCC-like features for each segment)

similar_artists shape = (100) (a list of 100 artists (their Echo Nest ID) similar

to Rick Astley according to The Echo Nest)

song_hotttnesss 0864248830588 (according to The Echo Nest when downloaded

(in December 2010) this song had a rsquohotttnesssrsquo of 08 (on a scale of 0 and 1))

song_id SOCWJDB12A58A776AF (The Echo Nest song ID note that a song can

be associated with many tracks (with very slight audio differences))

start_of _fade _out 198536 (start time of the fade out in seconds at the end

of the song according to The Echo Nest)

tatums_confidence shape = (794) (confidence value (between 0 and 1) associated

with each tatum by The Echo Nest)

tatums_start shape = (794) (start time of each tatum according to The Echo

Nest this song has 794 tatums)

17

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 27: Silver,Matthew final thesis

tempo 113359 (tempo in BPM according to The Echo Nest)

time_signature 4 (time signature of the song according to The Echo Nest ie

usual number of beats per bar)

time_signature_confidence 0634 (confidence of the time signature estimation)

title Never Gonna Give You Up (song title)

track_7digitalid 8707738 (the ID of this song on the service 7digitalcom)

track_id TRAXLZU12903D05F94 (The Echo Nest ID of this particular track

on which the analysis was done) year 1987 (year when this song was released

according to musicbrainzorg)

When choosing features my main goal was to use features that would most

likely yield meaningful results yet also be simple and make sense to the average

person The definition of ldquomeaningfulrdquo results is arbitrary as every music listener

will have his or her opinions to what constitutes different types of music but some

common features most people tend to differentiate songs by are pitch rhythm and

the types of instruments used The following specific fields provided in each song

object fall under these three terms

Pitch

bull segments_pitches a matrix of values indicating the strength of each pitch (or

note) at each discernible time interval

Rhythm

bull beats_start a vector of values indicating the start time of each beat

bull time_signature the time signature of the song

bull tempo the speed of the song in Beats Per Minute (BPM)

Instruments

18

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 28: Silver,Matthew final thesis

bull segments_timbre a matrix of values indicating the distribution of MFCC-like

features (different types of tones) for each segments

The segments_pitches feature is a clear candidate for a differentiating factor for

songs since it reveals patterns of notes that occur Additionally other research

papers that quantitatively examine songs like Mauchrsquos look at pitch and employ a

procedure that allows all songs to be compared with the same metric Likewise

timbre is intuitively a reliable differentiating feature since it reveals the amount

that different tones or sounds that sound different despite having the same pitch

Therefore segments_timbre is another feature that is considered in each song

Finally we look at the candidate features for rhythm At first glance all of these

features appear to be useful as they indicate the rhythm of a song in one way or

another However none of these features are as useful as the pitch and timbre

features While tempo is one factor in differentiating genres of EDM and music in

general tempo alone is not a driving force of musical innovation Certain genres

of EDM like drum nrsquo bass and happycore stand out for having very fast tempos

but the tempo is supplemented with a sound unique to the genre Conceiving new

arrangements of pitches combining instruments in new ways and inventing new

types of sounds are novel but speeding up or slowing down existing sounds is not

Including tempo as a feature could actually add noise to the model since many genres

overlap in their tempos And finally tempo is measured indirectly when the pitch

and timbre features are normalized for each song everything is measured in units of

ldquoper secondrdquo so faster songs will have higher quantities of pitch and timbre features

each second Time signature can be dismissed from the candidates features for the

same reason as tempo many genres contain the same time signature and including

it in the feature set would only add more noise beats_start looks like a more

promising feature since like segments_pitches and segments_timbre it consists of

a vector of values However difficulties arise when we begin to think how exactly

19

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 29: Silver,Matthew final thesis

we can utilize this information Since each song varies in length we need a way to

compare songs of different durations on the same level One approach could be to

perform basic statistics on the distance between each beat for example calculating

the mean and standard deviation of this distance However the normalized pitch

and timbre information already capture this data Another possibility is detecting

certain patterns of beats which could differentiate the syncopated dubstep or glitch

music beats from the steady pulse of electro-house But once again every beat is

accompanied by a sound with a specific timbre and pitch so this feature would not

add any significantly new information

23 Collecting Data and Preprocessing Selected Fea-

tures

231 Collecting the Data

Upon deciding the features I wanted to use in my research I first needed to collect

all of the electronic songs in the Million Song Dataset The easiest reliable way to

achieve this was to iterate through each song in the database and save the information

for the songs where any of the artist genre tags in artist_mbtags matched with an

electronic music genre While this measure was not fully accurate because it looks at

the genre of the artist not the song specific genre information for each song was not

as easily accessible so this indicator was nearly as good a substitute To generate a

list of the genres that electronic songs would fall under I manually searched through

a subset of the MSD to find all genres that seemed to be releated to electronic music

In the case of genres that were sometimes but not always electronic in nature such

20

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 30: Silver,Matthew final thesis

as disco or pop I erred on the side of caution and did not include them in the list

of electronic genres In these cases false positives such as primarily rock songs that

happen to have the disco label attached to the artist could inadvertantly be included

in the dataset The final list of genres is as follows

target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquo

rsquodrumrsquonrsquobassrsquo rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquo

rsquotrancersquo rsquodubsteprsquorsquotraprsquorsquodowntemporsquorsquoindustrialrsquorsquosynthpoprsquo

rsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

rsquodance and electronicarsquorsquoelectronicrsquo]

232 Pitch Preprocessing

A study conducted by music researcher Matthias Mauch [8] analyzes pitch in a musi-

cally informed manner The study first takes the raw sound data and converts it into

a distribution of each pitch where 0 is no detection of the pitch and 1 the strongest

amount Then it computes the most likely chord by comparing comparing the 4 most

common types of chords in popular music (major minor dominant 7 and minor 7) to

the observed chord The most common chords are represented as ldquotemplate chordsrdquo

and contain 0rsquos and 1rsquos where the 1rsquos represent the notes played in the chord For

example using the note C as the first index the C major chord is represented as

CTCM = (1 0 0 0 1 0 0 1 0 0 0 0)

For a given chroma frame c observed in the song the Spearmanrsquos Rho coefficient is

computed over every template chord

ρCTc =12sumi=1

(CT minus CTi)(ci minus c)σCTσc

21

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 31: Silver,Matthew final thesis

where CT is the mean of the values in the template chord σCT is the standard

deviation of the values in the chord and the operations on c are analogous Note

that the summation is over each individual pitch in the 12 pitch classes The chord

template with the highest value of ρ is selected as the chord for the time frame

After this is performed for each time frame the values are smoothed and then the

change between adjacent chords is observed The reasoning behind this step is that

by measuring the relative distance between chords rather than the chords themselves

all songs can be compared in the same manner even though they may have different

key signatures Finally the study takes the types of chord changes and classifies

them under 8 possible categories called ldquoH-topicsrdquo These topics are more abstracted

versions of the chord changes that make more sense to a human such as ldquochanges

involving dominant 7th chordsrdquo

In my preliminary implementation of this method on an electronic dance music

corpus I made a few modifications to Mauchrsquos study First I smoothed out time

frames before computing the most probable chords rather than smoothing the most

probable chords I did this to save time and to reduce volatility in the chord

measurements Using Rick Astleyrsquos ldquoNever Gonna Give You Uprdquo as a reference

which contains 935 time frames and lasts 212 seconds 5 time frames is slightly

under 1 second and for preliminary testing appeared to be a good interval for each

time block Second as mentioned in the literature section I did not abstract the

chord changes into H-topics This decision also stemmed from time constraints since

deriving semantic chord meaning from EDM songs would require careful research

into the types of harmonies and sounds common in that genre of music Below I

included a high-level visualization of the pitch metadata found in a sample song

ldquoFirestarterrdquo by The Prodigy and how I converted the metadata into a chord change

vector that I could then feed into the Dirichlet Process algorithm

22

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 32: Silver,Matthew final thesis

Note13 TF113 TF213 TF313 TF413 TF513 C13 1013 052513 081913 030313 038813

CD13 06113 1013 1013 064813 1013 D13 031913 059913 049313 020213 018513

DE13 022113 022913 02413 032213 024113 E13 028913 029813 026813 045213 032913 F13 040413 029813 029713 061313 043913

FG13 046513 039813 073313 1013 058913 G13 025413 036313 058813 063213 033713

GA13 012313 034313 067113 055513 02813 A13 031613 030813 043113 065913 02213

AB13 05213 02613 060313 085513 065413 B13 095113 026313 028613 027513 022513

13 13 13 13

060713 085213 036013 025113 032713 041013 063713 043513 039413 038713 057813 040013

13 13 13 13 13 13 13 13 13 13 13 13

13 13 13 13 F13 major13 (010000100010)13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13

Average13 distribution13 of13 pitches13 over13 every13 513 time13 frames13

Calculate13 most13 likely13 chord13 using13 Spearmanrsquos13 rho13

Calculate13 most13 likely13 chord13 over13 every13 other13 block13 of13 513 time13 frames13

Start13 with13 raw13 pitch13 data13 an13 Nx1213 vector13 where13 N13 is13 the13 number13 of13 time13 frames13 in13 the13 song13 and13 1213 the13 number13 of13 pitch13 classes13 Shown13 here13 are13 the13 first13 513 time13 frames13 of13 ldquoFirestarterrdquo13 by13 The13 Prodigy13 13

23

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 33: Silver,Matthew final thesis

13 13 F13 13 major13 G13 major13 13 13 13 Chord13 shift13 code13 =13 613 chord_changes[6]13 +=13 113 13 13 13 13 13 13 13 13 13 13 13 13

13

13 13 13 chord_changes13 =13 [1413 013 313 013 113 013 113 013 013 013 113 013 1113 013 213 013 013 013 213 013 213 113 113 013 113 013 013 013 313 113 013 113 213 013 113 313 113 113 013 013 113 013 013 213 013 013 013 013 1213 113 413 113 213 013 013 013 013 113 113 113 1413 013 613 013 213 013 013 113 013 013 613 013 013 213 013 013 013 013 313 213 013 113 213 113 113 113 013 013 013 013 013 013 013 113 113 013 013 013 013 013 113 213 013 113 013 113 013 113 013 213 113 113 113 113 013 113 013 213 113 113 013 213 113 113 013 013 013 113 013 113 013 513 313 013 013 213 013 013 013 113 013 113 013 113 413 013 013 013 013 013 213 013 013 013 013 013 213 013 213 013 113 013 013 113 013 113 113 013 013 213 013 013 013 013 013 013 113 013 013 013 113 013 013 013 013 013 013 013 013 013 113 0]13 13 Final13 192-shy‐element13 vector13 where13 chord_changes[i]13 is13 the13 number13 of13 times13 the13 chord13 change13 with13 code13 i13 existed13 in13 the13 song13 13 13

Major13 to13 Major13 step13 size13 =13 213 For13 two13 adjacent13 chords13 calculate13 the13 change13 in13 between13 them13 and13 increment13 count13 in13 table13 of13 chord13 change13 frequencies13 (19213 possible13 chord13 changes)13

24

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 34: Silver,Matthew final thesis

A final step I took to normalize the chord change data was to divide the numbers by

the length of the song so that each songrsquos number of chord changes was measured per

second

233 Timbre Preprocessing

For timbre I also used Mauchrsquos model to find a meaningful way to compare timbre

uniformly across all songs [8] After collecting all song metadata I took a random

sample of 20 songs from each year starting at 1970 The reason I forced the sampling

to 20 randomly sampled songs from each year and did not take a random sample of

songs from all years at once was to prevent bias towards any type of sounds As seen

in figure 22 there are significantly more songs from 2000-2011 than before 2000 The

mean year is x = 2001052 the median year is 2003 and the standard deviation of the

years is σ = 7060 A ldquorandom samplerdquo over all songs would almost definitely include

a disproportionate amount of more recent songs In order to not miss out on sounds

that may be more prevalent in older songs I required a set number of songs from each

year Next from each randomly selected song I selected 20 random timbre frames

in order to prevent any biases in data collection within each song In total there

were 422020 = 16800 timbre frames collected Next I clustered the timbre frames

using a Gaussian Mixture Model (GMM) varying the number of clusters from 10 to

100 and selecting the number of clusters with the lowest Bayes Information Criterion

(BIC) a statistical measure commonly used to calculate the best fitting model The

BIC was minimized at 46 timbre clusters I then re-ran the GMM with 46 clusters

and saved the mean values of each of the 12 timbre segments for each cluster formed

In the same way that every song had the same 192 cord changes whose frequencies

could be compared between songs each song now had the same 46 timbre clusters

but different frequencies in each song When reading in the metadata from each song

I calculated the most likely timbre cluster each timbre frame belonged to and kept

25

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 35: Silver,Matthew final thesis

Figure 22 Number of Electronic Music Songs in Million Song Dataset from EachYear

a frequency count of all of the possible timbre clusters observed in a song Finally

as with the pitch data I divided all observed counts by the duration of the song in

order to normalize each songrsquos timbre counts

26

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 36: Silver,Matthew final thesis

Chapter 3

Results

31 Methodology

After the pitch and timbre data was processed I ran the Dirichlet Process on the

data For each song I concatenated the 192-element chord change frequency list

and the 46-element timbre category frequency list giving each song a total of 238

features However there is a problem with this setup The pitch data will inherently

dominate the clustering process since it contains almost 3 times as many features

as timbre While there is no built-in function in scikit-learnrsquos DPGMM process to

give different weights to each feature I considered another possibility to remedy

this discrepancy duplicating the timbre vector a certain number of times and

concatenating that to the feature set of each song While this strategy runs the risk

of corrupting the feature set and turning it into something that does not accurately

represent each song it is important to keep in mind that even without duplicating

the timbre vector the feature set consists of two separate feature sets concatenated

to each other Therefore timbre duplication appears to be a reasonable strategy to

weigh pitch and timbre more evenly

27

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 37: Silver,Matthew final thesis

After this modification I tweaked a few more parameters before obtaining my

final results Dividing the pitch and timbre frequencies by the duration of the song

normalized every song to frequency per second but it also had the undesired effect

of making the data too small Timbre and pitch frequencies per second were almost

always less than 10 and many times hovered as low as 0002 for nonzero values

Because all of the values were very close to each other using common values of α

in the range of 01 to 1000-2000 was insufficient to push the songs into different

clusters As a result every song fell into the same cluster Increasing the value

of α by several orders of magnitude to well over 10 million fixed the problem but

this solution presented two problems First tuning α to experiment with different

ways to cluster the music would be problematic since I would have to work with

an enormous range of possible values for α Second pushing α to such high values

is not appropriate for the Dirichlet Process Extremely high values of α indicate a

Dirichlet Process that will try to disperse the data into different clusters but a value

of α that high is in principle always assigning each new song to a new cluster On

the other hand varying α between 01 and 1000 for example presents a much wider

range of flexibility when assigning clusters While this may be possible by varying

the values of α an extreme amount with the data as it currently is we are using

the Dirichlet Process in a way it should mathematically not be used Therefore

multiplying all of the data by a constant value so that we can work in the appropriate

range of α is the ideal approach After some experimentation I found that k=10 was

an appropriate scaling factor After initial runs of the Dirichlet Process I found out

that there was a slight issue with some of the earlier songs Since I had only artist

genre tags not specific song tags for each song I chose songs based on whether any

of the tags associated with the artist fell under any electronic music genre including

the generic term rsquoelectronicrsquo There were some bands mostly older ones from the

1960s and 1970s like Electric Light Orchestra which had some electronic music but

28

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 38: Silver,Matthew final thesis

mostly featured rock funk disco or another genre Given that these artists featured

mostly non-electronic songs I decided to exclude them from my study and generate

a blacklist indicating these music artists While it was infeasible to look through

every single song and determine whether it was electronic or not I was able to look

over the earliest songs in each cluster These songs were the most important to verify

as electronic because early non-electronic songs could end up forming new clusters

and inadvertently create clusters with non-electronic sounds that I was not looking for

The goal of this thesis is to identify different groups in which EM songs are

clustered and identify the most unique artists and genres While the second task is

very simple because it requires looking at the earliest songs in each cluster the first

is difficult to gauge the effectiveness of While I can look at the average chord change

and timbre category frequencies in each category as well as other metadata putting

more semantic interpretations to what the music actually sounds like and determining

whether the music is clustered properly is a very subjective process For this reason

I ran the Dirichlet Process on the feature set with values of α = (005 01 02) and

compared the clustering in each category examining similarities and differences in

the clusters formed in each scenario in the Discussion section For each value α I

set the upper limit of components or clusters allowed to 50 The ranges of α I used

resulted in 9 14 and 19 clusters formed

32 Findings

321 α=005

When I set α to 005 the Dirichlet Process split the songs into 9 clusters Below are

the distribution of years of the songs in each cluster (note that the Dirichlet Process

29

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 39: Silver,Matthew final thesis

does not number the clusters exactly sequentially so cluster numbers 5 7 and 10 are

skipped)

30

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 40: Silver,Matthew final thesis

Figure 31 Song year distributions for α = 005

For each value of α I also calculated the average frequency of each chord change

category and timbre category for each cluster and plotted the results The green

lines correspond to timbre and the blue lines to pitch

31

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 41: Silver,Matthew final thesis

Figure 32 Timbre and pitch distributions for α = 005

A table of each cluster formed the number of songs in that cluster and descriptions

of pitch timbre and rhythmic qualities characteristic of songs in that cluster are

shown below

32

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 42: Silver,Matthew final thesis

Cluster Song Count Characteristic Sounds

0 6481 Minimalist industrial space sounds dissonant chords

1 5482 Soft New Age ethereal

2 2405 Defined sounds electronic and non-electronic instru-

ments played in standard rock rhythms

3 360 Very dense and complex synths slightly darker tone

4 4550 Heavily distorted rock and synthesizer

6 2854 Faster paced 80s synth rock acid house

8 798 Aggressive beats dense house music

9 1464 Ambient house trancelike strong beats mysterious

tone

11 1597 Melancholy tones New wave rock in 80s then starting

in 90s downtempo trip-hop nu-metal

Table 31 Song cluster descriptions for α = 005

322 α=01

A total of 14 clusters were formed (16 were formed but 2 clusters contained only

one song each I listened to both of these songs and they did not sound unique so

I discarded them from the clusters) Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

33

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 43: Silver,Matthew final thesis

34

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 44: Silver,Matthew final thesis

Figure 33 Song year distributions for α = 01

35

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 45: Silver,Matthew final thesis

36

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 46: Silver,Matthew final thesis

Figure 34 Timbre and pitch distributions for α = 01

37

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 47: Silver,Matthew final thesis

Cluster Song Count Characteristic Sounds

0 1339 Instrumental and disco with 80s synth

1 2109 Simultaneous quarter-note and sixteenth note rhythms

2 4048 Upbeat chill simultaneous quarter-note and eighth

note rhythms

3 1353 Strong repetitive beats ambient

4 2446 Strong simultaneous beat and synths synths defined but

echo

5 2672 Calm New Age

6 542 Hi-hat cymbals dissonant chord progressions

7 2725 Aggressive punk and alternative rock

9 1647 Latin rhythmic emphasis on first and third beats

11 835 Standard medium-fast rock instrumentschords

16 1152 Orchestral especially violins

18 40 ldquoMartian alienrdquo sounds no vocals

20 1590 Alternating strong kick and strong high-pitched clap

28 528 Roland TR-like beats kick and clap stand out but fuzzy

Table 32 Song cluster descriptions for α = 01

323 α=02

With α set to 02 there were a total of 22 clusters formed 3 of the clusters consisted

of 1 song each none of which were particularly unique-sounding so I discarded them

for a total of 19 significant clusters Again the song distributions timbre and pitch

distributions and cluster descriptions are shown below

38

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 48: Silver,Matthew final thesis

39

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 49: Silver,Matthew final thesis

40

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 50: Silver,Matthew final thesis

Figure 35 Song year distributions for α = 02

41

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 51: Silver,Matthew final thesis

42

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 52: Silver,Matthew final thesis

43

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 53: Silver,Matthew final thesis

Figure 36 Timbre and pitch distributions for α = 02

44

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 54: Silver,Matthew final thesis

Cluster Song Count Characteristic Sounds

0 4075 Nostalgic and sad-sounding synths and string instru-

ments

1 2068 Intense sad cavernous (mix of industrial metal and am-

bient)

2 1546 Jazzfunk tones

3 1691 Orchestral with heavy 80s synths atmospheric

4 343 Arpeggios

5 304 Electro ambient

6 2405 Alien synths eery

7 1264 Punchy kicks and claps 80s90s tilt

8 1561 Medium tempo 44 time signature synths with intense

guitar

9 1796 Disco rhythms and instruments

10 2158 Standard rock with few (if any) synths added on

12 791 Cavernous minimalist ambient (non-electronic instru-

ments)

14 765 Downtempo classic guitar riffs fewer synths

16 865 Classic acid house sounds and beats

17 682 Heavy Roland TR sounds

22 14 Fast ambient classic orchestral

23 578 Acid house with funk tones

30 31 Very repetitive rhythms one or two tones

34 88 Very dense sound (strong vocals and synths)

Table 33 Song cluster descriptions for α = 02

45

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 55: Silver,Matthew final thesis

33 Analysis

Each of the different values of α revealed different insights about the Dirichlet Process

applied to the Million Song Dataset mainly which artists and songs were unique

and how traditional groupings of EM genres could be thought of differently Not

surprisingly the distributions of the years of songs in most of the clusters were skewed

to the left because the distribution of all of the EM songs in the Million Song Dataset

is left-skewed (see Figure 22) However some of the distributions vary significantly

for individual clusters and these differences provide important insights into which

types of music styles were popular at certain points in time and how unique the

earliest artists and songs in the clusters were For example for α = 01 Cluster 28rsquos

musical style (with sounds characteristic of the Roland TR-808 and TR-909 a two

programmable drum machines and synthesizers that became extremely popular in

1980s and 90s dance tracks) [13] coincides with the when the instruments were first

manufactured in 1980 Not surprisingly this cluster contained mostly songs from

the 80s and 90s and declined slightly in the 2000s However there were a few songs

in that cluster that came out before 1980 While these songs did not clearly use the

Roland TR machines they may have contained similar sounds that predated the

machines and were truly novel

First looking at α = 005 we see that all of the clusters contain a significant

number of songs although cluster 3 and 8 are notably smaller Cluster 3 contains

a heavier left tail indicating a larger number of songs from the 70s 80s and 90s

Inside the cluster the genres of music varied significantly from a traditional music

lens That is the cluster contained some songs with nearly all traditional rock

instruments others with purely synths and others somewhere in between all which

would normally be classified as different EM genres However under the Dirichlet

Process these songs were lumped together with the common themes of dense melodic

46

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 56: Silver,Matthew final thesis

melodies (as opposed to minimalistic repetitive or dissonant sounds) The most

prominent artists from the earlier songs are Ashra and John Hassell who composed

several melodic songs combining traditional instruments with synthesizers for a

modern feel The other small cluster number 8 contains a more normal year

distribution relative to the entire MSD distribution and also consists of denser beats

Another artist Cabaret Voltaire leads this cluster Cluster 9 sticks out significantly

because it contains virtually no songs before 1990 but increases rapidly in popularity

This cluster contains songs with hypnotically repetitive rhythm strong and ethereal

synths and an equally strong drum-like beat Given the emergence of trance in

the 1990s and the fact that house music in the 1980s contained more minimalistic

synths than house music in the 1990s this distribution of years makes sense Looking

at the earliest artists in this cluster one that accurately predates the later music

in the cluster is Jean-Michel Jarre A French composer pioneering in ambient and

electronic music [14] one of his songs Les Chants Magneacutetiques IV contains very

sharp and modulated synths along with a repetitive hi-hat cymbal rhythm and more

drawn-out and ethereal synths While the song sounds ambient at its normal speed

playing the song 15 times the normal speed resulted in a thumping fast-paced 16th

note rhythm that combined with the ethereal synths that contain certain chord

progressions sounded very similar to trance music In fact I found that stylistically

trance music was comparable to house and ambient music increased in speed Trance

music was a term not used extensively until the early 1990s but ambient and house

music were already mainstream by the 1980s so it would make sense that trance

evolved in this manner However this insight could serve as an argument that trance

is not an innovative genre in and of itself but is rather a clever combination of two

older genres Lastly we look at the timbre category and chord change distributions

for each cluster In theory these clusters should have significantly different peaks

of chord changes and timbre categories reflecting different pitch arrangements and

47

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 57: Silver,Matthew final thesis

instruments in each cluster The type 0 chord change corresponds to major rarr

major with no note change type 60 minor rarr minor with no note change type

120 dominant 7th major rarr dominant 7th major with no note change and type 180

dominant 7th minor rarr dominant 7th minor with no note change It makes sense that

type 0 60 120 and 180 chord changes are frequently observed because it implies

that chords in the song occurring next to each other are remaining in the same key

for the majority of the song The timbre categories on the other hand are more

difficult to intuitively interpret Mauchrsquos study addresses this issue by sampling

songs and sounds that are the closest to each timbre category and then playing the

sounds and attaching user-based interpretations based on several listeners [8] While

this strategy worked in Mauchrsquos study given the time and resources at my disposal

this strategy was not practical in my study I ended up comparing my subjective

summaries of each cluster and comparing the charts to see whether certain peaks

in the timbre categories corresponded to specific tones Strangely for α = 005 the

timbre and chord change data is very similar for each cluster This problem does not

occur for when α = 01 or 02 where the graphs vary significantly and correspond

to some of the observed differences in the music In summary below are the most

influential artists I found in the clusters formed and the types of music they created

that were novel for their time

bull Jean-Michel Jarre ambient and house music complicated synthesizer arrange-

ments

bull Cabaret Voltaire orchestral electronic music

bull Paul Horn new age

bull Brian Eno ambient music

bull Manuel Goumlttsching (Ashra) synth-heavy ambient music

48

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 58: Silver,Matthew final thesis

bull Killing Joke industrial metal

bull John Foxx minimalist and dark electronic music

bull Fad Gadget house and industrial music

While these conclusions were formed mainly from the data used in the MSD and the

resulting clusters I checked outside sources and biographies of these artists to see

whether they were groundbreaking contributors to electronic music Some research

revealed that these artists were indeed groundbreaking for their time so my findings

are consistent with existing literature The difference however between existing

accounts and mine is that from a quantitatively computed perspective I found some

new connections (like observing that one of Jarrersquos works when sped up sounded

very similar to trance music)

For larger values of α it is not only worth looking at interesting phenomena in

the clusters formed for that specific value but also comparing the clusters formed

to other values of α Since we are increasing the value of α more clusters will

be formed and the distinctions between each cluster will be more nuanced With

α = 01 the Dirichlet Process formed 16 clusters 2 of these clusters consisted of

only one song each and upon listening neither of these songs sounded particularly

unique so I threw those two clusters out and analyzed the remaining 14 Comparing

these clusters to the ones formed with α = 005 I found that some of the clusters

mapped over nicely while others were more difficult to interpret For example cluster

301 (cluster 3 when α = 01) contained a similar number of songs and a similar

distribution of the years the songs were released to cluster 9005 Both contain vitually

no songs before the 1990s and then steadily rise in popularity through the 2000s

Both clusters also contain similar types of music house beats and ethreal synths

reminiscent of ambient or trance music However when I looked at the earliest

49

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 59: Silver,Matthew final thesis

artists in cluster 301 they were different from the earliest artists in cluster 9005

One particular artist Bill Nelson stood out for having a particularly novel song

ldquoBirds of Tinrdquo for the year it was released (1980) This song features a sharp and

twangy synth beat that when sped up sounded like minimalist acid house music

While the α = 005 group differentiated mostly on general moods and classes of

instruments (like rock vs non-electronic vs electronic) the α = 01 group picked

up more nuanced instrumentation and mood differences For example cluster 1601

contained songs that featured orchestral string instruments especially violin The

songs themselves varied significantly according to traditional genres from Brian Eno

arrangements with classical orchestra to a remix of a song by Linkin Park a nu-metal

band which contained violin interludes This clustering raises an interesting point

that music that sounds very different based on traditional genres could be grouped

together on certain instruments or sounds Another cluster 2801 features 90s sounds

characteristic of the Roland TR-909 drum machine (which would explain why the

clusterrsquos songs increase dratistically starting in the 1990s and steadily decline through

the 2000s) Yet another cluster 601 contains a particularly heavy left tail indicating

a style more popular in the 1980s and the characteristic sound high-hat cymbals

is also a specialized instrument This specialization does not match up particularly

strongly with the clusters when α = 005 That is a single cluster with α = 005

does not easily map to one or more clusters in the α = 01 run although many of

the clusters appear to share characteristics based on the qualitative descriptions in

the tables The timbrechord change charts for each cluster appeared to at least

somewhat corroborate the general characteristics I attached to each cluster For

example the last timbre category is significantly pronounced for clusters 5 and 18

and especially so for 18 Cluster 18 was vocal-free ethereal space-synth sounds

so it would make sense that cluster 5 which was mainly calm New World also

contained vocal-free ethereal and space-y sounds It was also interesting to note

50

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 60: Silver,Matthew final thesis

that certain clusters like 28 contained one timbre category that completely dominated

all the others Many songs in this cluster were marked by strong and repetitive

beats reminiscent of the Roland TR synth drum machine which matches the graph

Likewise clusters 3 7 9 and 20 which appear to contain the same peak timbre

category were noted for containing strong and repetitive beats For this cluster I

added the following artists and their contributions to the general list of novel artists

bull Bill Nelson Minimalist house music

bull Vangelis Orchestral compositions with electronic notes

bull Rick Wakeman Rock compositions with spacy-sounding synths

bull Kraftwerk synth-based pop music

Finally we look at α = 02 With this parameter value the Dirichlet Process resulted

in 22 clusters formed 3 of these clusters contained only one song each and upon

listening to each of these songs I determined they were not particularly unique and

discarded them for a total of 19 remaining clusters Unlike the previous two values of

α where the clusters were relatively easy to subjectively differentiate this one was

quite difficult Slightly more than half of the clusters 10 out of 19 contained under

1000 songs and 3 contained under 100 Some of the clusters were easily mapped

to clusters in the other two α values like cluster 1702 which contains Roland TR

drum machine sounds and is comparable to cluster 2801 However many of the other

classifications seemed more dubious Not only did the songs within each cluster seem

to often vary significantly but the differences between many clusters appeared nearly

indistinguishable The chord change and timbre charts also support the difficulty

in distinguishing different clusters The y-axes for all of the years are quite small

implying that many of the timbre values averaged out because the songs were quite

different in each cluster Essentially the observations are quite noisy and do not have

51

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 61: Silver,Matthew final thesis

features that stand out as saliently as cluster 2801 for example The only exceptions

to these numbers were clusters 30 and 34 but there were so few songs in each of

these clusters that they represent only a small amount of the dataset Therefore I

concluded that the Dirichlet Process with α = 02 performed an insufficient job of

adequately clustering the songs Overall the clusters formed when α = 01 were

the most meaningful in terms of picking up nuanced moods and instruments without

splitting hairs and resulting in clusters a minimally trained ear could not differentiate

From this analysis the most appropriate genre classifications of the electronic music

from the MSD are the clusters described in the table where α = 01 and the most

novel artists along with their contributions are summarized in the finidings where

α = 005 and α = 01

52

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 62: Silver,Matthew final thesis

Chapter 4

Conclusion

In this chapter I first address weakness in my experiment and strategies to address

those weaknesses then I offer potential paths for researchers to build upon my ex-

periment and offer closing words regarding this thesis

41 Design Flaws in Experiment

While I made every effort possible to ensure the integrity of this experiment there

were various factors some beyond my control and others within my control but

unrealistic given the time and resources I had The largest issue was the dataset I

was working with While the MSD contained roughly 23000 electronic music songs

according to my classifications these songs did not come close to all of the electronic

music that was available From looking through the tracks I did see many important

artists meaning that there was some credibility to the dataset However there were

several other artists I was surprised to see missing and the artists included contained

only a limited number of popular songs Some traditionally defined genres like

dubstep were missing entirely from the dataset and the most recent songs came

from the year 2010 which meant that the past 5 years of rapid expansions in EM

were not accounted for Building a sufficient corpus of EM data is very difficult

53

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 63: Silver,Matthew final thesis

arguably more than for other genres because songs may be remixed by multiple

artists further blurring the line between original content and modifications For this

reason I considered my thesis to be a proof of concept Although the data I used

may not be ideal I was able to show that the Dirichlet Process could be used with

some amount of success to cluster songs based on their metadata

With respect to how I implemented the Dirichlet Process and constructed the

features my methodology could have been more extensive with additional time and

resources Interpreting the sounds in each song and establishing common threads is a

difficult task and unlike Pandora which used trained music theory experts to analyze

each song I relied on my own ears and anecdotal knowledge of EM Given the lack of

formal literature quantitatively analyzing EM and the resources I had this was my

best realistic option but was also not ideal The second notable weakness which was

more controllable was determining what exactly constitutes an EM song My criteria

involved iterating through every song and selecting those whose artist contained a

tag that fell inside a list of predetermined EM genres However this strategy is not

always effective since some artists contain only a small selection of EM songs and

have produced much more music involving rock or other non-EM genres To prevent

these songs from appearing in the dataset I would need to load another dataset

from a group called Lastfm which contains user-generated tags at the song level

Another more addressable weakness in my experiment was graphically analyzing the

timbre categories While the average chord changes were easy to interpret on the

graphs for each cluster and had easy semantic interpretations the timbre categories

were never formally defined That is while I knew the Bayes Information Criterion

was lowest when there were 46 categories I did not associate each timbre category

with a sound Mauchrsquos study addressed this issue by randomly selecting songs with

sounds that fell in each timbre category and asked users to listen to the sounds and

54

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 64: Silver,Matthew final thesis

classify what they heard Implementing this system would be an additional way of

ensuring that the clusters formed for each song were nontrivial I could not only

eyeball the measurements on each graph for timbre like I did in this thesis but also

use them to confirm the sounds I observed for each cluster Finally while my feature

selection contained careful preprocessing based on other studies that normalized

measurements between all songs there are additional ways I could have improved the

feature set For example one study looks at more advanced ways to isolate specific

timbre segments in a song identify repeating patterns and comparing songs to each

other in terms of the similarity of their timbres [15] More advanced methods like

these would allow me to more quantitatively analyze how successful the Dirichlet

Process is on effectively clustering songs into distinct categories

42 Future Work

Future work in this area quantitatively analyzing EM metadata to determine what

constitutes different genres and novel artists would involve tighter definitions proce-

dures evaluations of whether clustering was effective and music scrutiny All of the

weaknesses mentioned in the previous section barring perhaps the songs available in

the Million Song Dataset can be addressed with extensions and modifications to the

code base I created Addressing the greater issue of building an effective corpus of

music data for the MSD and constantly updating it might be addressed by soliciting

such data from an organization like Spotify but such an endeavor is very ambitious

and beyond the scope of any individual or small group research project without ex-

tensive funding and influence Once these problems are resolved and the dataset

songs accessed from the dataset and methods for comparing songs to each other are

accomplished the next steps would be to further analyze the results How do the

most unique artists for their time compare to the most popular artists Is there con-

55

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 65: Silver,Matthew final thesis

siderable overlap How long does it take for a style to grow in popularity if it even

does And lastly how can these findings be used to compose new genres of music and

envision who and what will become popular in the future All of these questions may

require supplementary information sources with respect to the popularity of songs

and artists for example and many of these additional pieces of information can be

found on the website of the MSD

43 Closing Remarks

While this thesis is an ambitious endeavor and can be improved in many respects it

does show that the methods implemented yield nontrivial results and could serve as

a foundation for future quantitative analysis of electronic music As data analytics

grows even more and groups such as Spotify amass greater amounts of information

and deeper insights on that information this relatively new field of study will hope-

fully grow EM is a dynamic energizing and incredibly expressive type of music

and understanding it from a quantitative perspective pays respect to what has up

until now been mostly analyzed from a curious outsiderrsquos perspective qualitatively

described but not examined as thoroughly from a mathematical angle

56

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 66: Silver,Matthew final thesis

Appendix A

Code

A1 Pulling Data from the Million Song Dataset

1 from __future__ import division2 import os3 import sys4 import time5 import glob6 import hdf5_getters not on adroit7

8 prevents output from showing ellipses when printed9 npset_printoptions(threshold=npnan)

10

11 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

12

13 basedir = rsquoscratchnetworkmssilvermssilvermsd_data_fulldatarsquo +str(sysargv[1])

14 ext = rsquoh5rsquo15

16 target_genres = [rsquohousersquorsquotechnorsquorsquodrum and bassrsquorsquodrum n bassrsquorsquodrumrsquonrsquobassrsquo

17 rsquodrumnbassrsquorsquodrum rsquonrsquo bassrsquorsquojunglersquorsquobreakbeatrsquorsquotrancersquorsquodubsteprsquorsquotraprsquorsquodowntemporsquo

18 rsquoindustrialrsquorsquosynthpoprsquorsquoidmrsquorsquoidm - intelligent dance musicrsquorsquo8-bitrsquorsquoambientrsquo

19 rsquodance and electronicarsquorsquoelectronicrsquo]20

21 relevant metadata for all EM songs found in the MSD22 all_song_data = 23 pitch_segs_data = []24 count = 025 start_time = timetime()26

27 for root dirs files in oswalk(basedir)28 files = globglob(ospathjoin(rootrsquorsquo+ext))29 for f in files

57

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 67: Silver,Matthew final thesis

30 h5 = hdf5_gettersopen_h5_file_read(f)31 if year unknown throw out sample32 if hdf5_gettersget_year(h5) == 033 h5close()34 continue35 if any(tag in str(hdf5_gettersget_artist_mbtags(h5)) for tag in

target_genres)36 print rsquofound electronic music song at 0 secondsrsquoformat(time

time()-start_time)37 count += 138 print (rsquosong count 0rsquoformat(count+1))39 h5_subdict = dict()40 h5_subdict[rsquotitlersquo] = hdf5_gettersget_title(h5)item()41 h5_subdict[rsquoartist_namersquo] = hdf5_gettersget_artist_name(h5)

item()42 h5_subdict[rsquoyearrsquo] = hdf5_gettersget_year(h5)item()43 h5_subdict[rsquodurationrsquo] = hdf5_gettersget_duration(h5)item()44 h5_subdict[rsquotimbrersquo] = hdf5_gettersget_segments_timbre(h5)

tolist()45 h5_subdict[rsquopitchesrsquo] = hdf5_gettersget_segments_pitches(h5)

tolist()46 track_id = hdf5_gettersget_track_id(h5)item()47 all_song_data[track_id] = h5_subdict48 print(rsquoSong 0 finished processing Total time elapsed 1

secondsrsquoformat(countstr(timetime() - start_time)))49 h5close()50

51 all_song_data_sorted = dict(sorted(all_song_dataitems() key=lambda k k[1][rsquoyearrsquo]))

52 sortedpitchdata = rsquoscratchnetworkmssilvermssilvermsd_dataraw_rsquo +resub(rsquorsquorsquorsquosysargv[1]) + rsquotxtrsquo

53 with open(sortedpitchdata rsquowrsquo) as text_file54 text_filewrite(str(all_song_data_sorted))

A2 Calculating Most Likely Chords and Timbre

Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 import ast15

16 prevents output from showing ellipses when printed17 npset_printoptions(threshold=npnan)

58

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 68: Silver,Matthew final thesis

18

19 column-wise mean of list of lists20 def mean(a)21 return sum(a) len(a)22

23 rsquorsquorsquoThis code computes the frequency of chord changes in each electronic songand runs the dirichlet process on it rsquorsquorsquo

24

25 basedir = rsquoscratchnetworkmssilvermssilverrsquo26 input_file = basedir + rsquomsd_dataraw_rsquo + str(sysargv[1]) + rsquotxtrsquo27 output_file = basedir + rsquomsd_datapreprocessed_rsquo + str(sysargv[1]) + rsquotxtrsquo28

29 json_contents = open(input_filersquorrsquo)read()30

31 all_song_data = []32 time_start = timetime()33 count = 034 for json_object_str in refinditer(rsquorsquotitlersquojson_contents)35 json_object_str = str(json_object_strgroup(0))36 json_object = astliteral_eval(json_object_str)37 json_object_new = 38

39 json_object_new[rsquotitlersquo] = json_object[rsquotitlersquo]40 json_object_new[rsquoartist_namersquo] = json_object[rsquoartist_namersquo]41 json_object_new[rsquoyearrsquo] = json_object[rsquoyearrsquo]42 json_object_new[rsquodurationrsquo] = json_object[rsquodurationrsquo]43

44 segments_pitches_old = json_object[rsquopitchesrsquo]45 segments_timbre_old = json_object[rsquotimbrersquo]46 segments_pitches_old_smoothed = []47 segments_timbre_old_smoothed = []48 chord_changes = [0 for i in range(0192)]49 smoothing_factor = 550 for i in range(0int(mathfloor(len(segments_pitches_old))

smoothing_factor))51 segments = segments_pitches_old[(smoothing_factori)(

smoothing_factori+smoothing_factor)]52 calculate mean frequency of each note over a block of 5 time

segments53 segments_mean = map(mean zip(segments))54 segments_pitches_old_smoothedappend(segments_mean)55 most_likely_chords = [msd_utilsfind_most_likely_chord(seg) for seg in

segments_pitches_old_smoothed]56 print rsquofound most likely chords at 0 secondsrsquoformat(timetime()-

time_start)57 calculate chord changes58 for i in range(0len(most_likely_chords)-1)59 c1 = most_likely_chords[i]60 c2 = most_likely_chords[i+1]61 if (c1[1] == c2[1])62 note_shift = 063 elif (c1[1] lt c2[1])64 note_shift = c2[1] - c1[1]65 else66 note_shift = 12 - c1[1] + c2[1]67 key_shift = 4(c1[0]-1) + c2[0]68 convert note_shift (0 through 11) and key_shift (1 to 16)69 to one of 196 categories for a chord shift70 chord_shift = 12(key_shift - 1) + note_shift71 chord_changes[chord_shift] += 1

59

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 69: Silver,Matthew final thesis

72 json_object_new[rsquochord_changesrsquo] = [cjson_object[rsquodurationrsquo] for c inchord_changes]

73 print rsquocalculated chord changes at 0 secondsrsquoformat(timetime()-time_start)

74

75 for i in range(0int(mathfloor(len(segments_timbre_old))smoothing_factor))

76 segments = segments_timbre_old[(smoothing_factori)(smoothing_factori+smoothing_factor)]

77 calculate mean frequency of each note over a block of 5 timesegments

78 segments_mean = map(mean zip(segments))79 segments_timbre_old_smoothedappend(segments_mean)80 print rsquofound most likely timbre categories at 0 secondsrsquoformat(time

time()-time_start)81 timbre_cats = [msd_utilsfind_most_likely_timbre_category(seg) for seg

in segments_timbre_old_smoothed]82 timbre_cat_counts = [timbre_catscount(i) for i in xrange(030)]83 json_object_new[rsquotimbre_cat_countsrsquo] = [tjson_object[rsquodurationrsquo] for t

in timbre_cat_counts]84 all_song_dataappend(json_object_new)85 count += 186

87 print rsquopreprocessing finished writing results to file at time 0rsquoformat(timetime()-time_start)

88 with open(output_filersquowrsquo) as f89 fwrite(str(all_song_data))90

91 print rsquofile merging complete at time 0rsquoformat(timetime()-time_start)

A3 Code to Compute Timbre Categories

1 from __future__ import division2 import os3 import sys4 import re5 import time6 import json7 import glob8 import hdf5_getters not on adroit9 import sklearnmixture

10 import msd_utils not on adroit11 import math12 import numpy as np13 import collections14 from string import ascii_uppercase15 import ast16 import matplotlibpyplot as plt17 import operator18 from collections import defaultdict19 import random20

21 timbre_all = []22 N = 20 number of samples to get from each year23 year_counts = dict(1956 2 1965 4 1968 3 1969 5 1970 23 1971 25

1972 26 1973 37 1974 35 1975 29 1976 28 1977 64 1978 771979 111 1980 131 1981 171 1982 199 1983 272 1984 190 1985

60

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 70: Silver,Matthew final thesis

189 1986 200 1987 224 1988 205 1989 272 1990 358 1991 3481992 538 1993 610 1994 658 1995 764 1996 809 1997 930 1998872 1999 983 2000 1031 2001 1230 2002 1323 2003 1563 20041508 2005 1995 2006 1892 2007 2175 2008 1950 2009 1782 2010742)

24

25 time_start = timetime()26 year_count = defaultdict(int)27 orig_dir = rsquoscratchnetworkmssilvermssilverrsquo28 orig_dir = rsquorsquo29 json_pattern = recompile(rsquorsquotitlersquoreDOTALL)30 N = 20 number of songs to sample from each year31 k = 20 number of frames to select from each song32 for l1 in ascii_uppercase33 for l2 in ascii_uppercase34 edm_textfile = orig_dir + rsquomsd_dataraw_rsquo + l1 + l2 + rsquotxtrsquo35 json_contents = open(edm_textfilersquorrsquo)read()36 for json_object_str in refindall(json_patternjson_contents)37 json_object = astliteral_eval(json_object_str)38 year = int(json_object[rsquoyearrsquo])39 prob = 10 if 10Nyear_counts[year] gt 10 else 10N

year_counts[year]40 if randomrandom() lt prob41 print rsquogetting timbre frames for song in directory 0 1

seconds after start of programrsquoformat(edm_textfiletimetime()-time_start)

42 duration = float(json_object[rsquodurationrsquo])43 timbre = [[tduration for t in l] for l in json_object[rsquo

timbrersquo]]44 try45 indices = randomsample(xrange(0len(timbre))k)46 except47 indices = xrange(0len(timbre))48 timbre_frames = [timbre[i] for i in indices]49 appended_timbre = [timbre_allappend(l) for l in

timbre_frames]50 print rsquofinished file 0 1 seconds after start of programrsquoformat(

edm_textfiletimetime()-time_start)51

52 with(open(rsquotimbre_frames_alltxtrsquorsquowrsquo)) as f53 fwrite(str(timbre_all))

A4 Helper Methods for Calculations

1 import os2 import re3 import json4 import glob5 import hdf5_getters6 import time7 import numpy as np8

9 rsquorsquorsquo some static data used in conjunction with the helper methods rsquorsquorsquo10

11 each 12-element vector corresponds to the 12 pitches starting with Cnatural and going up to B natural

12

61

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 71: Silver,Matthew final thesis

13 CHORD_TEMPLATE_MAJOR = [[100010010000][010001001000]14 [001000100100][000100010010]15 [000010001001][100001000100]16 [010000100010][001000010001]17 [100100001000][010010000100]18 [001001000010][000100100001]]19 CHORD_TEMPLATE_MINOR =[[100100010000][010010001000]20 [001001000100][000100100010]21 [000010010001][100001001000]22 [010000100100][001000010010]23 [000100001001][100010000100]24 [010001000010][001000100001]]25 CHORD_TEMPLATE_DOM7 = [[100010010010][010001001001]26 [101000100100][010100010010]27 [001010001001][100101000100]28 [010010100010][001001010001]29 [100100101000][010010010100]30 [001001001010][000100100101]]31 CHORD_TEMPLATE_MIN7 = [[100100010010][010010001001]32 [101001000100][010100100010]33 [001010010001][100101001000]34 [010010100100][001001010010]35 [000100101001][100010010100]36 [010001001010][001000100101]]37

38 CHORD_TEMPLATE_MAJOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MAJOR]

39 CHORD_TEMPLATE_MINOR_means = [npmean(chord) for chord inCHORD_TEMPLATE_MINOR]

40 CHORD_TEMPLATE_DOM7_means = [npmean(chord) for chord in CHORD_TEMPLATE_DOM7]

41 CHORD_TEMPLATE_MIN7_means = [npmean(chord) for chord in CHORD_TEMPLATE_MIN7]

42

43 CHORD_TEMPLATE_MAJOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MAJOR]

44 CHORD_TEMPLATE_MINOR_stdevs = [npstd(chord) for chord inCHORD_TEMPLATE_MINOR]

45 CHORD_TEMPLATE_DOM7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_DOM7]

46 CHORD_TEMPLATE_MIN7_stdevs = [npstd(chord) for chord in CHORD_TEMPLATE_MIN7]

47

48 TIMBRE_CLUSTERS = [[ 138679881e-01 395702571e-02 265410235e-0249 738301998e-03 -175014636e-02 -551147732e-0250 871851698e-03 -117595855e-02 107227900e-0251 875951680e-03 540391877e-03 617638908e-03]52 [ 314344510e+00 117405599e-01 408053561e+0053 -177934450e+00 293367968e+00 -135597928e+0054 -155129489e+00 775743158e-01 642796685e-0155 140794256e-01 337716831e-01 -327103815e-01]56 [ 356548165e-01 273288705e+00 194355982e+0057 106892477e+00 989739475e-01 -897330631e-0258 873234495e-01 -200747009e-03 344488367e-0159 993117800e-02 -243471766e-01 -190521726e-01]60 [ 422442037e-01 414115783e-01 143926557e-0161 -116143322e-01 -595186216e-02 -236927188e-0162 -683151409e-02 986816882e-02 243219098e-0263 693558977e-02 680121418e-03 397485360e-02]64 [ 194727799e-01 -139027782e+00 -239875671e-01

62

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 72: Silver,Matthew final thesis

65 -284583677e-01 192334219e-01 -283421048e-0166 215787541e-01 114840341e-01 -215631833e-0167 -409496877e-02 -690838017e-03 -724394810e-03]68 [ 196565167e-01 498702717e-02 -343697282e-0169 254170701e-01 112441266e-02 154740401e-0170 -470447408e-02 810868802e-02 303736697e-0371 143974944e-03 -275044913e-02 148634678e-02]72 [ 221364497e-01 -296205105e-01 157754028e-0173 -557641279e-02 -925625566e-02 -615316168e-0274 -138139882e-01 -554936599e-02 166886836e-0175 646238260e-02 124093863e-02 -209274345e-02]76 [ 212823455e-01 -932652720e-02 -439611467e-0177 -202814479e-01 498638770e-02 -126572488e-0178 -111181799e-01 325075635e-02 201416694e-0279 -569216463e-02 261922912e-02 830817468e-02]80 [ 162304042e-01 -734813956e-03 -202552550e-0181 180106705e-01 -572110826e-02 -917148244e-0282 -620429191e-03 -608892354e-02 102883628e-0283 384878478e-02 -872920419e-03 237291230e-02]84 [ 169023095e-01 681311168e-02 -371039856e-0285 -213139780e-02 -418752028e-03 136407740e-0186 258515825e-02 -410328777e-04 293149920e-0287 -197874734e-02 201177066e-02 429260690e-03]88 [ 416829358e-01 -128384095e+00 886081556e-0189 913717416e-02 -319420208e-01 -182003637e-0190 -319865507e-02 -171517045e-02 347472066e-0291 -353047665e-02 558354602e-02 -506222122e-02]92 [ 383948137e-01 106020034e-01 401191058e-0193 149470482e-01 -958422411e-02 -494473336e-0294 227589858e-02 -567352733e-02 384666644e-0295 -215828055e-02 -167817151e-02 115426241e-01]96 [ 907946444e-01 326120397e+00 298472002e+0097 -142615404e-01 129886103e+00 -453380431e-0198 154008478e-01 -355297093e-02 -295809181e-0199 157037690e-01 -729692046e-02 115180285e-01]

100 [ 160870896e+00 -232038235e+00 -796211044e-01101 155058968e+00 -219377663e+00 501030526e-01102 -171767279e+00 -136642470e+00 -242837527e-01103 -414275615e-01 -733148530e-01 -456676578e-01]104 [ 642870687e-01 134486839e+00 216026845e-01105 -213180345e-01 310866747e-01 -397754955e-01106 -354439151e-01 -595938041e-04 495054274e-03107 467013422e-02 -180823854e-02 125808320e-01]108 [ 116780496e+00 228141229e+00 -329418720e+00109 -154239912e+00 212372153e-01 251116768e+00110 184273560e+00 -406183916e-01 119175125e+00111 -924407446e-01 685444429e-01 -638729005e-01]112 [ 239097414e-01 -113382447e-02 306327342e-01113 468182987e-03 -103107607e-01 -317661969e-02114 346533705e-02 146440386e-02 688291154e-02115 172580481e-02 -623970238e-03 -652822380e-03]116 [ 174850329e-01 -186077411e-01 269285838e-01117 522452803e-02 -371708289e-02 -642874319e-02118 -501920042e-03 -114565540e-02 -261300268e-03119 -694872458e-03 120157063e-02 201341977e-02]120 [ 193220674e-01 162738332e-01 172794061e-02121 789933755e-02 158494767e-01 904541006e-04122 -333177052e-02 -142411500e-01 -190471155e-02123 -241622739e-02 -257382438e-02 284895062e-02]124 [ 331179197e+00 -156765268e-01 442446188e+00

63

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 73: Silver,Matthew final thesis

125 205496297e+00 507031622e+00 -352663849e-02126 -568337901e+00 -117825301e+00 541756637e-01127 -315541339e-02 -158404846e+00 737887234e-01]128 [ 236033237e-01 -501380019e-01 -701568834e-02129 -214474169e-01 558739133e-01 -345340886e-01130 236469930e-01 -251770230e-02 -441670143e-01131 -173364633e-01 992353986e-03 101775476e-01]132 [ 313672832e+00 155128891e+00 460139512e+00133 982477544e-01 -387108002e-01 -134239667e+00134 -300065797e+00 -441556909e-01 -777546208e-01135 -659017029e-01 -142596356e-01 -978935498e-01]136 [ 850714148e-01 228658856e-01 -365260753e+00137 270626948e+00 -190441544e-01 566625676e+00138 177531510e+00 239978921e+00 110965660e+00139 158484130e+00 -151579214e-02 864324026e-01]140 [ 114302559e+00 118602811e+00 -388130412e+00141 869833825e-01 -823003310e-01 -423867795e-01142 856022598e-01 -108015106e+00 174840192e-01143 -135493558e-02 -117012561e+00 168572940e-01]144 [ 354117814e+00 612714769e-01 767585243e+00145 250391333e+00 181374399e+00 -146363231e+00146 -174027236e+00 -572924078e-01 -120787368e+00147 -413954661e-01 -462561948e-01 678297871e-01]148 [ 831843044e-01 441635485e-01 700724425e-02149 -472159900e-02 308326493e-01 -447009822e-01150 327806057e-01 652370380e-01 328490360e-01151 128628172e-01 -778065861e-02 691343399e-02]152 [ 490082031e-01 -953180204e-01 176970476e-01153 157256960e-01 -526196238e-02 -319264458e-01154 391808304e-01 219368239e-01 -206483291e-01155 -625044005e-02 -105547224e-01 318934196e-01]156 [ 149899454e+00 -430708817e-01 243770498e+00157 703149621e-01 -228827845e+00 270195855e+00158 -471484280e+00 -118700075e+00 -177431396e+00159 -223190236e+00 820855264e-01 -235859902e-01]160 [ 120322544e-01 -366300816e-01 -125699953e-01161 -121914056e-01 693277338e-02 -131034684e-01162 -154955924e-03 248094288e-02 -309576314e-02163 -166369415e-03 148904987e-04 -142151992e-02]164 [ 652394765e-01 -681024464e-01 636868117e-01165 304950208e-01 262178992e-01 -320457080e-01166 -198576098e-01 -302173163e-01 204399765e-01167 444513847e-02 -950111498e-02 -114198739e-02]168 [ 206762180e-01 -208101829e-01 261977630e-01169 -171672300e-01 561794250e-02 213660185e-01170 390259585e-02 478176392e-02 172812607e-02171 344052067e-02 626899067e-03 248544728e-02]172 [ 739717363e-01 437786285e+00 254995502e+00173 113151212e+00 -358509503e-01 220806129e-01174 -220500355e-01 -722409824e-02 -270534083e-01175 107942098e-03 270174668e-01 187279353e-01]176 [ 125593809e+00 671054880e-02 870352571e-01177 -432607959e+00 230652217e+00 547476105e+00178 -611052479e-01 107955720e+00 -216225471e+00179 -795770149e-01 -731804973e-01 968935954e-01]180 [ 117233757e-01 -123897829e-01 -488625265e-01181 142036530e-01 -723286756e-02 -699808763e-02182 -117525019e-02 570221674e-02 -767796123e-03183 417505873e-02 -233375716e-02 194121001e-02]184 [ 167511025e+00 -275436700e+00 145345593e+00

64

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 74: Silver,Matthew final thesis

185 132408871e+00 -166172505e+00 100560074e+00186 -882308160e-01 -595708043e-01 -727283590e-01187 -103975499e+00 -186653334e-02 139449745e+00]188 [ 320587677e+00 -284451104e+00 854849957e+00189 -444001235e-01 104202144e+00 735333682e-01190 -248763292e+00 738931361e-01 -174185596e+00191 -107581842e+00 205759299e-01 -820483513e-01]192 [ 331279737e+00 -508655734e-01 661530870e+00193 116518280e+00 474499155e+00 -231536191e+00194 -134016130e+00 -715381712e-01 278890594e+00195 204189275e+00 -380003033e-01 116034914e+00]196 [ 179522019e+00 -813534697e-02 437167420e-01197 226517020e+00 885377295e-01 107481514e+00198 -725322296e-01 -219309506e+00 -759468916e-01199 -137191387e+00 260097913e-01 934596450e-01]200 [ 350400906e-01 817891485e-01 -863487084e-01201 -731760701e-01 970320805e-02 -360023996e-01202 -291753495e-01 -803073817e-02 665930095e-02203 160093340e-01 -129158086e-01 -518806100e-02]204 [ 225922929e-01 278461593e-01 539661393e-02205 -237662670e-02 -270343295e-02 -123485570e-01206 231027499e-03 587465112e-05 186127188e-02207 283074747e-02 -187198676e-04 124761782e-02]208 [ 453615634e-01 318976020e+00 -835029351e-01209 784124578e+00 -443906795e-01 -178945492e+00210 -114521031e+00 100044304e+00 -404084981e-01211 -486030348e-01 105412721e-01 563666445e-02]212 [ 393714086e-01 -307226477e-01 -487366619e-01213 -457481697e-01 -291133171e-04 -239881719e-01214 -215591352e-01 -121332941e-01 142245002e-01215 502984582e-02 -805878851e-03 195534173e-01]216 [ 186913010e-01 -161000977e-01 595612425e-01217 187804293e-01 222064227e-01 -109008289e-01218 783845058e-02 515228647e-02 -818113578e-02219 -237860551e-02 341013800e-03 364680417e-02]220 [ 332919314e+00 -214341251e+00 720913997e+00221 176143734e+00 164091808e+00 -266887649e+00222 -926748006e-01 -278599285e-01 -739434005e-01223 -387363085e-01 800557250e-01 115628886e+00]224 [ 476496444e-01 -119334793e-01 309037235e-01225 -345545294e-01 130114716e-01 506895559e-01226 212176840e-01 -414296750e-03 452439064e-02227 -162163990e-02 693683152e-02 -577607592e-03]228 [ 300019324e-01 543432074e-02 -772732930e-01229 147263806e+00 -279012581e-02 -247864869e-01230 -210011388e-01 278202425e-01 616957205e-02231 -166924986e-01 -180102286e-01 -378872162e-03]]232

233 TIMBRE_MEANS = [npmean(t) for t in TIMBRE_CLUSTERS]234 TIMBRE_STDEVS = [npstd(t) for t in TIMBRE_CLUSTERS]235

236 rsquorsquorsquohelper methods to process raw msd datarsquorsquorsquo237

238 def normalize_pitches(h5)239 key = int(hdf5_gettersget_key(h5))240 segments_pitches = hdf5_gettersget_segments_pitches(h5)241 segments_pitches_new = [transpose_by_key(pitch_segkey) for pitch_seg in

segments_pitches]242 return segments_pitches_new243

65

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 75: Silver,Matthew final thesis

244 def transpose_by_key(pitch_segkey)245 pitch_seg_new = []246 for i in range(012)247 idx = (i + key) 12248 pitch_seg_newappend(pitch_seg[idx])249 return pitch_seg_new250

251 rsquorsquorsquo given a time segment with distributions of the 12 pitches find the mostlikely chord playedrsquorsquorsquo

252 def find_most_likely_chord(pitch_vector)253 rho_max = 00254 index each chord255 most_likely_chord = (11)256 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MAJOR

CHORD_TEMPLATE_MAJOR_meansCHORD_TEMPLATE_MAJOR_stdevs))257 rho = 00258 for i in range(012)259 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))260 if (abs(rho) gt abs(rho_max))261 rho_max = rho262 most_likely_chord = (1idx)263 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MINOR

CHORD_TEMPLATE_MINOR_meansCHORD_TEMPLATE_MINOR_stdevs))264 rho = 00265 for i in range(012)266 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))267 if (abs(rho) gt abs(rho_max))268 rho_max = rho269 most_likely_chord = (2idx)270 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_DOM7

CHORD_TEMPLATE_DOM7_meansCHORD_TEMPLATE_DOM7_stdevs))271 rho = 00272 for i in range(012)273 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))274 if (abs(rho) gt abs(rho_max))275 rho_max = rho276 most_likely_chord = (3idx)277 for idx (chordmeanstdev) in enumerate(zip(CHORD_TEMPLATE_MIN7

CHORD_TEMPLATE_MIN7_meansCHORD_TEMPLATE_MIN7_stdevs))278 rho = 00279 for i in range(012)280 rho += (chord[i] - mean)(pitch_vector[i] - npmean(pitch_vector))((

stdev+001)(npstd(pitch_vector)+001))281 if (abs(rho) gt abs(rho_max))282 rho_max = rho283 most_likely_chord = (4idx)284 return most_likely_chord285

286 def find_most_likely_timbre_category(timbre_vector)287 most_likely_timbre_cat = 0288 rho_max = 00289 for idx (segmeanstdev) in enumerate(zip(TIMBRE_CLUSTERSTIMBRE_MEANS

TIMBRE_STDEVS))290 rho = 00291 for i in range(012)292 rho += (seg[i] - mean)(timbre_vector[i] - npmean(seg))((stdev+001)

(npstd(timbre_vector)+001))

66

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 76: Silver,Matthew final thesis

293 if (abs(rho) gt abs(rho_max))294 rho_max = rho295 most_likely_timbre_cat = idx296 return most_likely_timbre_cat

67

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 77: Silver,Matthew final thesis

Bibliography

[1] Marck Bailey Mark j butler publishes scholarly work on dance mu-sic httpwwwmusicnorthwesterneduaboutnews2012mark-j-butler-publishes-scholarly-work-on-dance-musichtml mar 2012

[2] Kenneth Taylor Ishkurrsquos guide to edm httptechnoorgelectronic-music-guide

[3] Deal further strengthens spotifyrsquos music discovery expertise httptheechonestcompressreleasesspotify-acquires-echo-nest mar 2014

[4] Josh Constine Inside the spotify - echo nest skunkworks httptechcrunchcom20141019the-sonic-mad-scientists oct 2014

[5] The future of music genres is here httpblogechonestcompost73516217273the-future-of-music-genres-is-here jan 2014

[6] About the music genome project httpwwwpandoracomaboutmgp

[7] Joan Serragrave Aacutelvaro Corral Mariaacuten Boguntildeaacute Martiacuten Haro and Josep Ll ArcosMeasuring the evolution of contemporary western popular music Sci Rep 2jul 2012

[8] Matthias Mauch Robert M MacCallum Mark Levy and Armand M Leroi Theevolution of popular music Usa 1960ndash2010 Royal Society Open Science 2(5)2015

[9] Thierry Bertin-Mahieux Daniel PW Ellis Brian Whitman and Paul LamereThe million song dataset In Proceedings of the 12th International Conferenceon Music Information Retrieval (ISMIR 2011) 2011

[10] F Pedregosa G Varoquaux A Gramfort V Michel B Thirion O GriselM Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A PassosD Cournapeau M Brucher M Perrot and E Duchesnay Scikit-learn Machinelearning in Python Journal of Machine Learning Research 122825ndash2830 2011

[11] Lars Buitinck Gilles Louppe Mathieu Blondel Fabian Pedregosa AndreasMueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre GramfortJaques Grobler Robert Layton Jake VanderPlas Arnaud Joly Brian Holt andGaeumll Varoquaux API design for machine learning software experiences from

68

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography
Page 78: Silver,Matthew final thesis

the scikit-learn project In ECML PKDD Workshop Languages for Data Miningand Machine Learning pages 108ndash122 2013

[12] Edwin Chen Infinite mixture models with nonparametric bayesand the dirichlet process httpblogechenme20120320infinite-mixture-models-with-nonparametric-bayes-and-the-dirichlet-processmar 2012

[13] Graham Massey Roland tr-808 The drum machine that changed music httpwwwbbccomnewsentertainment-arts-26682781 mar 2014

[14] Jean-Michel Jarre and Agencja Artystyczna MTJ Jean Michel Jarre DisquesDreyfus 1999

[15] Francois Pachet Jean-Julien Aucouturier and Mark Sandler The way it soundsTimbre models for analysis and retrieval of music signals IEEE TRANSAC-TIONS ON MULTIMEDIA 7(6)1028ndash35 dec 2005

69

  • Abstract
  • Acknowledgements
  • Contents
  • List of Tables
  • List of Figures
  • 1 Introduction
    • 11 Background Information
    • 12 Literature Review
    • 13 The Dataset
      • 2 Mathematical Modeling
        • 21 Determining Novelty of Songs
        • 22 Feature Selection
        • 23 Collecting Data and Preprocessing Selected Features
          • 231 Collecting the Data
          • 232 Pitch Preprocessing
          • 233 Timbre Preprocessing
              • 3 Results
                • 31 Methodology
                • 32 Findings
                  • 321 =005
                  • 322 =01
                  • 323 =02
                    • 33 Analysis
                      • 4 Conclusion
                        • 41 Design Flaws in Experiment
                        • 42 Future Work
                        • 43 Closing Remarks
                          • A Code
                            • A1 Pulling Data from the Million Song Dataset
                            • A2 Calculating Most Likely Chords and Timbre Categories
                            • A3 Code to Compute Timbre Categories
                            • A4 Helper Methods for Calculations
                              • Bibliography